Open gothub opened 4 years ago
@jeanetteclark could you provide use cases and details of how you would use this info? This would help in determining the best implementation.
We ran into an issue that kinda relates to this issue on PISCO. A resource map was created where one of the PIDs had a typo in it, effectively referencing and PID that doesn't exist and never will. So indexing failed. The quick way to fix this right now is to use arcticdatautils::update_resource_map
but it'd be really nice if dataone could do this. The workflow might look like this:
pkg <- getDataPackage(client, "package_id")
replaceMember(pkg, "badMember", "goodMember")
uploadDataPackage(client)
This doesn't look like it would work right now, mostly because getDataPackage
relies on the Solr index and packages broken like this won't be indexed.
Yeah that is another good use case. I often need to look at the relationships that are inside the resource map directly (not what is in the index), and this is one of the reasons why
Agreed, reliance on the index is problematic. This seems like an easy fix in that calling getDataPackage
with the package identifier should be able to grab it directly and parse it locally.
I think in general it would be good for us to download and parse the ORE file even when calling it with a metadata identifier. But is there a mechanism for determining the package identifier given only a metadata id if the ORE parsing has failed (i.e., can getDataPackage(client, "metadata_id")
also find the package identifier and then download it for parsing if the ORE was not indexed)?
@mbjones currently you can specify a metadataid or resource map id to download a package. If you specify the metadata id, getDataPackage()
uses the Solr index to determine the resmap id. Not sure how to determine that without the index.
Switching to parsing the resmap locally should be straightforward to implement.
Since it's impossible to know which resource map the user is talking about given only a metadata record, arcticdatautils actually produces a warning when its get_package
function is called with a metadata id instead of the id of a resource map. It then goes on to guess what the user meant using the same logic MetacatUI uses to do the same task.
the same logic MetacatUI uses
And what is that logic? Does it depend on the index?
Yep. IIRC the steps are like this:
resourceMap
field. If missing, fail.resourceMap
field to those that are not obsoleted by another objectOK, well, that sounds like the same logic that is currently in the method. So, if we refactor to 1) retrieve the object and see if it is a package ORE, and if so, parse it and use it to load all of the objects, and 2) if not, then fall back to using the index to lookup the ORE pid, we should have parity in the methods. I think the key is the we should prioritize populating the package from the ORE file directly over the index. Would that solve this issue?
@jeanetteclark please review
It should be easy to review the contents of a package without downloading it. This review should include information about the pids of the package, the package owner (submitter, rightsholder) and package permissions.
This function may just query the Solr store for a package to get this info, or may work in conjuction with
datapack
, for example, see this issue.