Living-with-machines / station-to-station

This repository provides underlying code and materials for the paper 'Station to Station: Linking and Enriching Historical British Railway Data'.
https://ceur-ws.org/Vol-2989/long_paper29.pdf
MIT License
5 stars 1 forks source link

Bhocands wikidata alignment #3

Closed fedenanni closed 3 years ago

fedenanni commented 4 years ago

@mcollardanuy ehi! it's a tiny PR and still sketched - but if you could check the function in align_bho_cands_to_wikidata.ipynb and tell me where that should go (or do we want to have it in this independent script for now?) - the other script is just a draft that will be expanded soon

kmcdono2 commented 4 years ago

OK here is where I am. I have set up the deezy environment, downloaded the wikidata from your Zenodo deposit, cloned this repo, checked out the bhocands_wiki_alignmentbranch.

Now I am trying to run python create_britTM_dataset.py but I get this error: No such file or directory: '/resources/wikigazetteer/wikiGaz_en_filtered.pkl'

I have wikiGaz_en_basic.pklbut not wikiGaz_en_filtered.pkl

How do I make that? I can't seem to find instructions anywhere. Or do I just rename the basic.pkl?

mcollardanuy commented 4 years ago

Hi @kmcdono2! You're right, there's a gap there, I will update the code and instructions in a bit, thanks for checking!

mcollardanuy commented 4 years ago

Hi @fedenanni,

@mcollardanuy ehi! it's a tiny PR and still sketched - but if you could check the function in align_bho_cands_to_wikidata.ipynb and tell me where that should go (or do we want to have it in this independent script for now?) - the other script is just a draft that will be expanded soon

Thanks! I have moved the two notebooks to a new folder: toponym_resolution/bho_wikidata. I hope that's fine, this way we can add a new directory for other combinations of datasets. I thought of keeping the bho/ folder specifically for things BHO-related (e.g. processing), and likewise for the wikidata/ folder.

I have checked the notebook and it all looks good, but I haven't properly tested it yet: there's that other PR waiting for DeezyMatch regarding the distance metrics, so I will try to do that other review first because it will have a direct impact on this one.

fedenanni commented 4 years ago

@mcollardanuy good idea for the resolution/ folder, that's perfect. Ok, let's do the other PR and come back to this one after

mcollardanuy commented 4 years ago

Hi @fedenanni,

I will now review the notebooks align_bho_cands_to_wikidata.ipynb and stats_on_disambiguating_bho_cands.ipynb.

Could you have a look at the changes I made in the rest of the notebooks/scripts in this pull request, if you have time? They are small but important changes in the code, mostly about fixing paths and making sure all pieces of code are connected. Sorry that I reused this PR for this!

The readmes should be up-to-date now:

The main changes in the codes are adding prepare_britwikigaz.ipynb (a new notebook that downloads an English wikigazetteer from Zenodo and creates a British Isles-filtered version of the Wikipedia gazetteer, which will be used to train a DeezyMatch model) and fixing paths or documenting.

Thanks!

fedenanni commented 3 years ago

@mcollardanuy ok - all good on my side!