DataONEorg / rdataone

R package for reading and writing data at DataONE data repositories
http://doi.org/10.5063/F1M61H5X
36 stars 19 forks source link

uploadDataPackage doesn't always set public read on all members of package when public=TRUE flag is set #285

Closed amoeba closed 2 years ago

amoeba commented 2 years ago

A data team member noticed a number of scenarios where setting the public argument of uploadDataPackage to TRUE didn't set public read on all members of the package.

I think the common interpretation of this argument would be that setting public=TRUE would ensure all members (metadata, data, resource map) would get a public read access rule applied to them. Further, the documentation for the argument backs that up:

public: A 'logical', if TRUE then all objects in this package will be accessible by any user

I did a bit of testing and found a specific sequence that reproduces this:

  1. Create a new package or get an existing one via getDataPackage
  2. Ensure all members (metadata, data, resource map) do not have a public read access rule on them
  3. Add a local file to the package w/o public read (new("DataObject"), addMember)
  4. Use uploadDataPackage(..., public = TRUE)
  5. Observe the metadata and all existing data objects do not get a public read access rule. The resource map object and the new data object do get public read rules

uploadDataPackage's logic is basically to iterate over all members and call uploadDataObject on them. Then, uploadDataObject uses a bit of logic to determine if we need a createObject, updateObject, or if we should do nothing:

https://github.com/DataONEorg/rdataone/blob/75b371117f506f5edc6dc864437777350f7a95d1/R/D1Client.R#L1369-L1372

It turns out that, for any package members that were already in the package, this conditional is false and the upload is never sent. I consider this a bug and something that would be good to fix sooner rather than later.

@gothub what do you think?

mbjones commented 2 years ago

Makes total sense. So, if a data file doesn't change but it's system metadata does change, we need to update the system metadata (but not upload the data again). So it seems like our check for whether the sysmeta is changed is where our bug lies.

jeanetteclark commented 2 years ago

An edge case cropped up today that seems to cause the fix above to fall on its face. I haven't been able to look at it yet, but it seems like the uploadDataPackage function somewhere down the line tries to change sysmeta@obsoletes, which cropped up when we had a csv file with more than one version. Interestingly, the update still went through but the sysmeta was not updated with public read as expected. Here is the error that was spat out

Warning message: In .local(x, ...) : Error updating urn:uuid:86d572cd-770c-4665-a030-65a13f879291: The request is trying to reset the obsoletes field in the System Metadata of the Object urn:uuid:86d572cd-770c-4665-a030-65a13f879291. This is illegal since the obsoletes field is already set and cannot be changed once set.

jeanetteclark commented 2 years ago

Closing this, I have the fix implemented and tested, and fixed the other issue that cropped up as well