datacommonsorg / website

Code for the Data Commons website
https://datacommons.org
Apache License 2.0
20 stars 74 forks source link

Update all embedding description input csv to have 'dcid' and 'sentence' columns #4349

Closed shifucun closed 3 weeks ago

shifucun commented 3 weeks ago

I have ran run.sh for all the indexes and confirmed there are no pre-index csv changes.

This PR also fixed/cleaned up the commands and path issue from previous changes.

The original csv files are converted with a script like https://paste.googleplex.com/5485631748964352. Basically merge all the description columns into one column and separated by ";"