Simplify and test the whole process of adding a manuscript to CU

jeromepl commented 8 years ago

I also need to write out all the steps to achieve this, and simplify this process as much as possible. In addition, the manifests should not be stored on the server anymore, we should only store the urls.

This is for @ahankinson 's presentation in early August.

Here is my guess as to how the process would currently go:

Put the manifest file in the static/iiif directory. Its name needs to be the manuscript's siglum, slugified.
Download the data dump from Cantus DB and put it in the _datadumps folder (This and the next step should be done before Andrew's presentation)
Import the csv data into Django and Solr through the following management command: ./manage.py import_data chants <path-to-csv>
In the Django admin interface, set the field public of this new manuscript to True.
Use the mapping tool to create a folio mapping. (Once this is done, it takes a while to Solr to refresh all the chants, but this operation is done in the background.)
On the client side, remove the manuscript from the file manuscript-list/futureManuscripts.js if it is there. Those are the manuscripts displayed as in preparation on the /manuscripts page.
(Optional) Rebuild the Solr suggesters with path-to-solr/solr/collection1/suggest?wt=json&suggest.dictionary=feastSuggester&suggest.build=true --> Also for genreSuggester and officeSuggester. Warning: This operation takes a while.

Note that there are a few extra steps if adding OMR data as well.

jeromepl commented 8 years ago

@ahankinson Are you going to import the manuscripts to the live server or to a local version on your machine during your presentation?

ahankinson commented 8 years ago

Not really a presentation, just a meeting. On the live server would be fine, preferably the Dev server

jeromepl commented 8 years ago

I just finished importing Paris 12044 and realized that you might want to prepare a folio mapping before hand, as it takes quite some time if there are a lot of folios and if the url doesn't contain the folio name (as is the case with Gallica manuscripts). You could show the tool, but use the management command ./manage.py import_folio_mapping <manuscript_id> [mapping_csv_file] to quickly import it.

ahankinson commented 8 years ago

Could you make sure I have a login to the Dev machine to do this? Username should be ahankins

jeromepl commented 8 years ago

I am currently working on a way to make the manuscript importation process entirely available through the Django admin interface. I will update you with all instructions on how to do so once that is done. This means you will only need a Django admin username and password.

jeromepl commented 8 years ago

Alright, here are the new steps:

What will be done before the meeting: (I should be able to do that on Monday)

Download CSV from http://cantus.uwaterloo.ca/ and put in data_dumps folder
Find what the manuscript ID of that manuscript is through the Django admin interface
Import the data in Django and Solr with ./manage.py import_data chants <path_from_datadump_to_csv> <manuscript_id>

What you will need to do during the meeting:

In the Django admin interface, find the manuscript you want to set public, check the box that says 'public' and add the cantus_url and the manifest_url. (Will take a little while to update when saving, make sure to wait for the django admin page to refresh. You CAN start the next step in the meantime if you want)
Create a folio mapping at url https://dev-cantus.simssa.ca/admin/map_folios (This will refresh the chants in Solr. It will take a while, once again, but the manuscript will already be visible, with folios and chants appearing as time passes)
OR connect to the server via ssh and use the management command ./manage.py import_folio_mapping <manuscript_id> <mapping_csv_file>. This will import a pre-created folio mapping.

Much easier than previously in my opinion!

Also, the entire process could be automated if I could get access to the data dumps from Cantus DB statically. Right now I need to wait for them to generate the CSV file before downloading it. If I could have a direct link a simple admin page could be create to do everything (Import the chants, set the manuscript to be public, do the folio mapping).

jeromepl commented 8 years ago

I want to add here for reference that there are now 2 new management commands:

Generate a new data dump of all the public manuscripts: ./manage.py generate_public_datadump The CSV file generated (public-manuscripts.csv) can then be imported using the next command.
Import a data dump of all the public manuscripts: (into Django and Solr) ./manage.py update_public_manuscripts

Those two commands should make migration easier since there is no need anymore to manually go and edit each public manuscript to check the 'public' box and add plugins.

jeromepl commented 8 years ago

Things that have changed since my last 2 comments:

There is no more 'waiting' after both saving changes to a manuscript in the Django admin interface and after submitting a folio mapping. Instead, those changes happen in separate threads, which prevents timeout errors on the prod server.
Folio mappings can now be saved on the client side by clicking on the 'Save Backup' button in the folio mapping interface.
These backups can be loaded by drag-and-dropping the CSV file in the window, still in the folio mapping interface.

Note that the backup files have the same format as the data dumps file created when submitting a mapping. They can thus be imported directly on the server with ./manage.py import_folio_mapping <manuscript_id> <mapping_csv_file> where the _mapping_csvfile is located in the folder _data_dumps/foliomapping/

ahankinson commented 8 years ago

So the message "The data dump file has to be in public/cantusdata/static/iiif/" is no longer correct?

jeromepl commented 8 years ago

Yes, this is completely wrong. It should say "The data dump file has to be in public/data_dumps/"

ahankinson commented 8 years ago

Can someone without access to the server do this?

jeromepl commented 8 years ago

Everything can be done with a Django admin username and password, except for the part where the data is downloaded and imported from Cantus DB. If we had a static access to CSV files from the Cantus DB then everything could be done without ever ssh-ing to the server.

DDMAL / cantus

Simplify and test the whole process of adding a manuscript to CU #279