ExposuresProvider / cam-pipeline

Data loading pipeline for CAM database
https://exposuresprovider.github.io/cam-pipeline/
MIT License
2 stars 4 forks source link

Add provenance for SIGNOR #77

Closed gaurav closed 1 year ago

gaurav commented 1 year ago

This PR attempts to implement the temporary fix we described in https://github.com/ExposuresProvider/cam-pipeline/issues/76#issuecomment-1178023752 by (1) loading only the SIGNOR models into Blazegraph, (2) marking all the models as having SIGNOR provenance, and then (3) loading the remaining models.

I was unable to test this locally, but I'll get the instructions for running this on Sterling from Jim and try running it there.

gaurav commented 1 year ago

Running this on my local computer in Docker, I ended up with only the provenance information -- this might be a glitch at my end somewhere, but it went away when I switched from pav:importedFrom to pav:providedBy, so I suspect that the SIGNOR models might have somehow been getting deleted by sparql/delete-non-production-models.ru. In any case, as I mentioned in https://github.com/ExposuresProvider/cam-pipeline/issues/76#issuecomment-1204435599, it looks like pav:providedBy is what the other models are using, so it probably makes sense to standardize to that.

I've manually triggered this on Sterling (as build-cam-database-manual-zrgzp-l4gbk), so hopefully in a while we'll see if that worked (note that the last manual Sterling job, build-cam-database-manual-jbtqs-hqslj, was terminated with an org.apache.jena.atlas.RuntimeIOException: java.io.IOException: Input/output error, but somehow appears as "Succeeded" instead of "Failed" in Kubernetes :-/ -- hopefully that's related to this most recent fix, but if not, I'll open up another issue for that).

gaurav commented 1 year ago

Note that build-cam-database-manual-zrgzp-l4gbk seems to have worked on Sterling! We'll try to implement #79 in-house and use that to check the development Blazegraph server.