HeardLibrary / vandycite

0 stars 0 forks source link

Extract data for works to be written from act_all_202109241353_repaired.csv ACT dump #54

Closed baskaufs closed 2 years ago

baskaufs commented 2 years ago

act_all_202109241353_repaired.csv is at https://github.com/HeardLibrary/vandycite/blob/master/act/processed_lists/act_all_202109241353_repaired.csv

baskaufs commented 2 years ago

Refer to, and revise checklist at https://github.com/HeardLibrary/vandycite/issues/9

baskaufs commented 2 years ago

In the step for disambiguating with existing items, the ACT ID 58413 threw an error, apparently because there was no match in the metadata dump. Need to investigate why this happened.

baskaufs commented 2 years ago

The most recent version of add_to_wikidata.csv is the one in the create_items folder.

Need to generate already_in_templated_data.csv, pulling data from Commons, for the new works to add. Do this with the linked-data/commonsbot/commons_data.ipynb script. Need to change the path to point to the add_to_wikidata.csv file. Then run the first cell to load functions and the last cell to actually generate the file.

baskaufs commented 2 years ago

To generate the act_data_fix.csv and commons_data_fix.csv files needed to use compare_metadata_sources.ipynb in the processed_lists folder. Run first cell to load functions, then skip to the last cell since I already know what works need to be written.

errors:

image filename: Hilma af Klint - 1907 - Altarpiece - No 1 - Group X - Altarpieces.jpg 58413 not found in ACT database.

baskaufs commented 2 years ago

The script used to generate the Vanderbot upload CSV is the same one used for data in create_incorrectly_linked_artwork_items folder: create_act_items.ipynb. Most files are in the create_items_folder, although I had to create a column-headers-only version of abstract_artworks.csv from the version in the create_incorrectly_linked_artwork_items folder.

baskaufs commented 2 years ago

The deal with ACT ID 58413 Hilma af Klint - 1907 - Altarpiece - No 1 - Group X - Altarpieces.jpg is that its ID doesn't come up in ACT. I don't know if it's been removed or something, but I guess that means we just ignore it.

baskaufs commented 2 years ago

Completed with https://github.com/HeardLibrary/vandycite/commit/e23c776b0f1153b0eb321c32fad6e2c397e6e545