DDMAL / cantus

:mag: Searching with Optical Music Recognition technology and the Cantus Database
http://cantus.simssa.ca
MIT License
15 stars 4 forks source link

Provenance data no longer imports with manuscript data import #808

Closed dchiller closed 8 months ago

dchiller commented 9 months ago

Currently, manuscript provenance data is parsed from the html on source detail page on Cantus DB. However, with the switch to NewCantus, the html is no longer tagged in the same way, so our parser does not work. AND, the provenance element is no longer tagged with any identifying class or id, so an html parser in general seems like a bad way to go.

We could, at least for now, get the provenance information from the json_info field of the json-node API, but this (as I understand it) will only work for sources that were on OldCantus. Obviously this is not a problem right now, but would necessitate another fix at some point in the future.

dchiller commented 9 months ago

@jacobdgm @lucasmarchd01

The way Cantus Ultimus gets provenance data from CantusDB is once again presenting some issues, so wondering if we can discuss before I move forward with a fix here.

This was mentioned in a related CantusDB issue (https://github.com/DDMAL/CantusDB/issues/564#issuecomment-1450698980) with the idea of adding a new API endpoint for provenance. However, a simpler solution that would work for what Cantus Ultimus needs would simply be to provide the provenance in the response of the json_node endpoint for a source. A provenance_id is currently returned, but it seems like it would be easy enough to just add an element to the json response that includes the provenance value. It's slightly more work now, but means we wouldn't need to worry about this again if/when we want to include "new" manuscripts (ones that were first inventoried on NewCantus) on Cantus Ultimus.

jacobdgm commented 9 months ago

It will be easy to set up APIs to return JSON for all the objects in our DB (including provenances). Until now, it has been low-priority, since it hasn't been strictly necessary. But I may as well just do it, since it won't take much time, and it seems like it will be useful here.