cdli-gh / data

This is a copy of the daily dump of catalogue and ATF data from the Cuneiform Digital Library Initiative (http://cdli.ucla.edu)
http://cdli.ucla.edu/bulk_data
53 stars 12 forks source link

added parser script #58

Closed Lord-of-Codes closed 4 years ago

Lord-of-Codes commented 4 years ago

PR as per https://gitlab.com/cdli/framework/issues/157

Extracting Sumerian language data (both translated and untranslated) is working fine. Extracting data based on genre is not working as expected probably due to the time complexity involved in the approach used. Working on new approach. Opening this draft PR to this ascertain that the requirements are being satisfied and this is what was supposed to be done.

To run the script, first merge the two catalogue files as mentioned in the readme and then type: python3 parser.py

Feedback welcome. :)