NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data

GNU General Public License v2.0

27 stars 13 forks source link

While investigating our collection of submission errors in MetacatUI, I discovered that it's possible submit an invalid resource map to Metacat and receive a 200 status without any error. I would expect that resource maps would be validated in the same way that sysmeta and EML objects are.

Here's a reproducible example:

Create an invalid resource map. In my case, I saved a resource_map.xml file with the following text: <?xml version="1.0" encod

Create sysmeta for the object. I used this sysmeta_template.rdf.xml:

<d1_v2.0:systemMetadata xmlns:d1_v2.0="http://ns.dataone.org/service/types/v2.0"
xmlns:d1="http://ns.dataone.org/service/types/v1">
<serialVersion>0</serialVersion>
<identifier>RESOURCE MAP ID HERE</identifier>
<formatId>http://www.openarchives.org/ore/terms</formatId>
<size>25</size>
<checksum algorithm="MD5">9614dd15192a58ae2a91a6243e70a992</checksum>
<submitter>http://orcid.org/0000-0002-1615-3963</submitter>
<rightsHolder>http://orcid.org/0000-0002-1615-3963</rightsHolder>
<accessPolicy>
<allow>
  <subject>public</subject>
  <permission>read</permission>
</allow>
<allow>
  <subject>CN=arctic-data-admins,DC=dataone,DC=org</subject>
  <permission>read</permission>
  <permission>write</permission>
  <permission>changePermission</permission>
</allow>
</accessPolicy>
<fileName>resource_map.xml</fileName>
</d1_v2.0:systemMetadata>

You'll want to the submitter to your ORCID

Generate a PID, update the sysmeta template, then upload the resource map + sysmeta to a test node:
```
# 1. Set your token
TOKEN="your-token-here"
```

2. Generate the pid

PID="resource_map_urn:uuid:$(uuidgen)"

3. Make a copy of the sysmeta with the new PID

cp sysmeta_template.rdf.xml sysmeta.rdf.xml sed -i '' "s/RESOURCE MAP ID HERE/$PID/" sysmeta.rdf.xml

echo "\nUploading bad resource map with PID: $PID"

echo "\nResource Map:\n" cat resource_map.xml

echo "\n\nSysmeta:\n" cat sysmeta.rdf.xml

echo "\n\n\n OUTPUT FROM CURL COMMAND: \n"

/opt/homebrew/opt/curl/bin/curl -i \ -X POST \ -H "Accept: /" \ -H "Authorization: Bearer $TOKEN" \ -F "pid=$PID" \ -F "sysmeta=@sysmeta.rdf.xml;type=application/xml" \ -F "object=@resource_map.xml;type=application/xml" \ "https://dev.nceas.ucsb.edu/knb/d1/mn/v2/object"

echo "\n\n Done"


4. See that the server returns a `HTTP/1.1 200 200` status along with the the PID for the resource map:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

resource_map_urn:uuid:7286F53A-D29B-4087-9CE8-DEE244EEE5F6



The file then exists on the server, but of course, is not _really_ a resource map, e.g.
- [meta query](https://dev.nceas.ucsb.edu/knb/d1/mn/v2/meta/resource_map_urn:uuid:7286F53A-D29B-4087-9CE8-DEE244EEE5F6)
- [object query](https://dev.nceas.ucsb.edu/knb/d1/mn/v2/object/resource_map_urn:uuid:7286F53A-D29B-4087-9CE8-DEE244EEE5F6)
- [solr query](https://dev.nceas.ucsb.edu/knb/d1/mn/v2/query/solr/?q=id:%22resource_map_urn:uuid:7286F53A-D29B-4087-9CE8-DEE244EEE5F6%22)

---

Here's the code above as downloadable files (just remember to remove the `.txt`).
[sysmeta_template.rdf.xml.txt](https://github.com/user-attachments/files/17298578/sysmeta_template.rdf.xml.txt)
[create_res_map.sh.txt](https://github.com/user-attachments/files/17298788/create_res_map.sh.txt)
[resource_map.xml.txt](https://github.com/user-attachments/files/17298580/resource_map.xml.txt)

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix cito: <http://purl.org/spar/cito/> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ore: <http://www.openarchives.org/ore/terms/> . @prefix prov: <http://www.w3.org/ns/prov#> . @prefix provone: <http://purl.dataone.org/provone/2015/01/15/ontology#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix dataone: <https://cn.dataone.org/cn/v2/resolve/> . <dataone:METADATA_ID> dcterms:identifier "METADATA_ID"^^xsd:string ; cito:documents <dataone:METADATA_ID>, <dataone:DATAOBJ_ID> ; cito:isDocumentedBy <dataone:METADATA_ID> ; ore:isAggregatedBy <dataone:RESOURCE_MAP_ID#aggregation> . <dataone:RESOURCE_MAP_ID> dcterms:creator [ a dcterms:Agent ; foaf:name "DataONE R Client"^^xsd:string ] ; dcterms:identifier "RESOURCE_MAP_ID"^^xsd:string ; dcterms:modified "2024-10-08T20:24:47Z"^^xsd:dateTime ; ore:describes <dataone:RESOURCE_MAP_ID#aggregation> ; a ore:ResourceMap . <dataone:RESOURCE_MAP_ID#aggregation> dc:title "DataONE Aggregation" ; ore:aggregates <dataone:METADATA_ID>, <dataone:DATAOBJ_ID> ; a ore:Aggregation . <dataone:DATAOBJ_ID> dcterms:identifier "DATAOBJ_ID"^^xsd:string ; cito:isDocumentedBy <dataone:METADATA_ID> ; ore:isAggregatedBy <dataone:RESOURCE_MAP_ID#aggregation> .

NCEAS / metacat

Invalid resource maps can be submitted to Metacat without error #1981

2. Generate the pid

3. Make a copy of the sysmeta with the new PID