CleanData / Datasets

Datasets
0 stars 0 forks source link

Dataset: datahub.io #2

Open jonroberts opened 10 years ago

jonroberts commented 10 years ago

datahub.io has a lot of datasets on there, but many are duplicates and it's not specifically energy focused.

karvenlam commented 10 years ago

Lots of spams. Scripts still have problem filtering out spams with short descriptions.

karvenlam commented 10 years ago

Giving up on datahub. I started scraping with ~6000 datasets. Two days ago, there were ~10000. Today, there are ~30000, and most of the new ones are spams. Read span_digest.json for details.

jonroberts commented 10 years ago

Ouch. Looks like they're in a bad state. Thanks for the update.

J

On 10 October 2013 10:14, karvenlam notifications@github.com wrote:

Giving up on datahub. I started scraping with ~6000 datasets. Two days ago, there were ~10000. Today, there are ~30000, and most of the new ones are spams. Read span_digest.json for details.

— Reply to this email directly or view it on GitHubhttps://github.com/CleanData/Datasets/issues/2#issuecomment-26056930 .

karvenlam commented 10 years ago

Will run datarelations after NYC opendata completes.