caltechlibrary / irdm_harvester

Automatically harvest publications for an InvenioRDM repository
Other
1 stars 0 forks source link

Add publishers/journal titles to options.yaml #16

Closed tmorrell closed 6 months ago

tmorrell commented 6 months ago

Starting at row 206 in https://docs.google.com/spreadsheets/d/1BTMadLyOJ0gxnKd-_UJ-m_888lXJsWwFqqjEFhrDwBY/edit#gid=0

rsdoiel commented 6 months ago

It was just a quick and dirty Python script, see publisher_groups_to_yaml.py in irdmtools. I'll update the script to allow passing in a CSV file as a parameter, check the columns, etc then do the merge of CSV and existing YAML. Let's leave this ticket open in the meantime.

rsdoiel commented 6 months ago

OK, I've cleaned up the script and added a USAGE statement explaining what it does. I also renamed it to update_publisher_options.py. Using it would be the following.

./update_publisher_options.py options.yaml CaltechAUTHORS_publisher_groups.csv

Where options.yaml is our options file and CaltechaUTHORS_publisher_groups.csv is the CSV file George curated. I read in the CSV using DictReader so the column order doesn't matter but the column names do. You can get the names and order from doing.

./update_publisher_options.py

This will display the usage text.

rsdoiel commented 6 months ago

OK, I've merged George's updated spreadsheet with the existing options.yaml file and commit the changes to this repository. Do you want me to add the update_publishers_options.yaml from irdmtools?

rsdoiel commented 6 months ago

I added a text file to document the process and how to use the Python script.