Living-with-machines / station-to-station

This repository provides underlying code and materials for the paper 'Station to Station: Linking and Enriching Historical British Railway Data'.
https://ceur-ws.org/Vol-2989/long_paper29.pdf
MIT License
5 stars 1 forks source link

Review Wikidata processing #8

Closed mcollardanuy closed 3 years ago

mcollardanuy commented 3 years ago

Instructions:

mcollardanuy commented 3 years ago

@kmcdono2 please let me know if something is not clear! Thank you!

mcollardanuy commented 3 years ago

Ah @kmcdono2, you actually don't need to download the Wikidata dump file unless you run the last cell ("Parse all wikidata").

kmcdono2 commented 3 years ago

just starting. there is an error in importing pydash. do you want me to comment in a pull request or here?

(I just had to add pip install pydash - but I assume this is an update to the py37deezy env that I've missed)

kmcdono2 commented 3 years ago

also !pip install wikidata

mcollardanuy commented 3 years ago

Hi Katie, here it's fine, thanks!

kmcdono2 commented 3 years ago

notes from parsed record:

kmcdono2 commented 3 years ago

And now I'm having a look at some other places to see re: other fields

https://www.wikidata.org/wiki/Q209055

https://www.wikidata.org/wiki/Q205679

https://www.wikidata.org/wiki/Q9679

Scottish records: https://www.wikidata.org/wiki/Q2015758

Rail-related properties Euston station example https://www.wikidata.org/wiki/Q800751

Street- or building-related properties https://www.wikidata.org/wiki/Q6939080

kmcdono2 commented 3 years ago

I'm tempted to include 1 of the following IDs: VIAF, Library of Congress, or WorldCat. Simply bc sometimes those are all that are listed.

kmcdono2 commented 3 years ago

OK all done! Happy to discuss if needed.

mcollardanuy commented 3 years ago

Hi @kmcdono2 sorry for the delay, I've addressed this here: 141480276370a4a17f0af6e351271ccf2dac940b

I think we already discussed that some months ago, but I can't find it: I haven't added fields that link to external data or datasets that we don't and won't have (since they won't help in the disambiguation and we can always add this if we need it at some later point), I think I have covered everything else that wasn't already there!

Thanks!