DDMAL / CantusDB

A new site for Cantus Database running under Django.
https://cantusdatabase.org
MIT License
5 stars 6 forks source link

We need to establish a workflow for creating chants given a CSV file #1259

Open jacobdgm opened 8 months ago

jacobdgm commented 8 months ago

Debra recently gave us a .csv file with information for a bunch of chants that need created on cantusdatabase.org.

We need a system to create chants based on CSV files. Debra said that this used to happen on OldCantus, and will continue to need to happen from time to time in NewCantus.

We could create a fully automated system with a specification - "your file should include all these columns and exactly these columns", and so on. It might, however, make sense to have a more flexible system - perhaps a management command that can be adapted by a developer to accommodate whichever individual spreadsheets are sent to us by the musicologists. If we adopt this second approach, however, we will need to attend to these .csv files promptly.

Thoughts on how best to approach this?

annamorphism commented 8 months ago

I think the best approach would be the first one, in an interface only accessible to admins. The second approach means a lot more work for developers to work out what's what, and I would anticipate such files to be mediated through a Debra anyway.

ahankinson commented 8 months ago

My 2c, FWIW. I would suggest doing a combination of 1 and 2: A strict CSV format that is uploaded by admins on the command line.

My reasons:

  1. Error handling with data import is really hard. Communicating the errors with processing and uploading spreadsheets takes a lot of forethought and effort. The command-line, on the other hand, is quite easy -- an exception thrown on the command line doesn't need to be reported anywhere else.
  2. My experience is that users always want to modify their spreadsheets, either intentionally ("Oh, I thought it would import this new column automatically") or unintentionally ("No, I must have deleted that header by accident."). A validation step, followed by an import, is probably the best approach for all involved.
  3. If something goes wrong ("OH, shoot -- I didn't mean to overwrite those!") it's much easier to see that happen on the command line
  4. It's easier for devs to test a CSV upload on a staging system and then run it on the production system, than it is to expect users to run it on staging first.

You might approach it in a way that you develop a sort of import module, which is initially called by the management scripts but, then when it matures, can move to a UI system.

jacobdgm commented 8 months ago

If we follow @ahankinson's advice, perhaps the best approach is: set up a management command that expects a CSV file with a specific format. A developer copies the CSV into the container and runs the management command on Staging, and makes any necessary changes to the CSV (reordering/renaming columns, etc.) in case of error messages. If/when the command runs to completion, upload the working CSV to Production and run it there. Unless something unforseen arises, this would take maybe 5-10 minutes of developer time per CSV.

Does this make sense?

jacobdgm commented 8 months ago

I've begun work on this, but for the specific CSV at hand, progress is blocked until we figure out what's going on with #1261.

jacobdgm commented 8 months ago

if it's true that Sources (rather than Chants) should have a fragmentarium_id (see https://github.com/DDMAL/CantusDB/issues/1261#issuecomment-1906108929), then work on this can proceed - after creating the source and all the chants, we can just add the proper value for the Fragmentarium ID on the source once the field has been created.

annamorphism commented 2 months ago

curious if there's been any progress on this, since it came up today in passing...