archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: It is possible for a Dataverse transfer to overwrite its own data #219

Open ross-spencer opened 5 years ago

ross-spencer commented 5 years ago

Expected behaviour

The contents of a Dataverse dataset transfer are not overwritten by other contents in the same transfer.

Current behaviour

Downloading this dataset: https://demodv.scholarsportal.info/dataset.xhtml?persistentId=doi%3A10.5072%2FFK2%2FHYLHBO&version=DRAFT will result in the mutable-study-onecitation-bib.bub file and mutable-study-two-ddi.xml being overwritten.

The checksums of the dataset are as follows:

0e9a5438a2df877803915c5dd2e181a7  mutable-study-one-bib.bib
716bf477224581b9aa44426b98783136  mutable-study-onecitation-bib.bib
616044d71d911298c4ba6e17d1580010  mutable-study-one.csv
67c5026a40914da5128e6f473ef11e57  mutable-study-two.csv
49fcb77ba168199227079735ccb54f4d  mutable-study-two-ddi.xml

And the checksums of the objects directory are as follows:

0e9a5438a2df877803915c5dd2e181a7  mutable-study-one-bib.bib
2deec9db366f8521adfbfac132d3a20d  mutable-study-onecitation-bib.bib
50ae9c28f3dc3828e5f62f8e4661bc8e  mutable-study-onecitation-endnote.xml
3594298b49e534cb9ead142ec1cae7e5  mutable-study-onecitation-ris.ris
616044d71d911298c4ba6e17d1580010  mutable-study-one.csv
54191780ab04cfdd42ae238b11369fd9  mutable-study-one-ddi.xml
1aec7e8fac750b7791b24abb1db2a3e4  mutable-study-one.RData
08e6ca0f67e71f62972f550f52be0885  mutable-study-one.tab
2deec9db366f8521adfbfac132d3a20d  mutable-study-twocitation-bib.bib
1b404a874a4d6e2b7ebeedbd39c87102  mutable-study-twocitation-endnote.xml
831961ebe05a2d0529deb4ca40ff2c7c  mutable-study-twocitation-ris.ris
67c5026a40914da5128e6f473ef11e57  mutable-study-two.csv
f7a966f6b445ef23db4473a8d2c9b2e7  mutable-study-two-ddi.xml
c2e335334cd7eac671684b869d6a2b39  mutable-study-two.RData
a05c9071d352592aea1ea72f60086fbd  mutable-study-two.tab

The hashes:

716bf477224581b9aa44426b98783136  mutable-study-onecitation-bib.bib
49fcb77ba168199227079735ccb54f4d  mutable-study-two-ddi.xml

Cannot be seen in the objects dir.

Additional context

This dataset has been crafted specifically as a proof of concept. I cannot yet imagine a real-world situation where this would be encountered.

Steps to reproduce

Attempt the provided dataset as a Dataverse transfer in AM qa/1.x. Hash: https://github.com/artefactual/archivematica/commit/06c0b31f7917918cd878d5cdf4c1fb10aea466ee

Your environment (version of Archivematica, OS version, etc)

Docker-compose. AM qa/1.x


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

ross-spencer commented 5 years ago

One approach to this is to maintain some structure in the SIP when it is created, e.g.

tree
.
ā”œā”€ā”€ 30_CFLQ_271_13-3-13_1524[1].pdf
ā”œā”€ā”€ dataset.json
ā”œā”€ā”€ relocation2011
ā”‚Ā Ā  ā”œā”€ā”€ relocation2011citation-bib.bib
ā”‚Ā Ā  ā”œā”€ā”€ relocation2011citation-endnote.xml
ā”‚Ā Ā  ā”œā”€ā”€ relocation2011citation-ris.ris
ā”‚Ā Ā  ā”œā”€ā”€ relocation2011-ddi.xml
ā”‚Ā Ā  ā””ā”€ā”€ relocation2011.tab
ā”œā”€ā”€ relocation2011.dat
ā”œā”€ā”€ relocation2011.sps
ā”œā”€ā”€ RELOCATION_FINAL_CANADA_738TOTAL
ā”‚Ā Ā  ā”œā”€ā”€ RELOCATION_FINAL_CANADA_738TOTALcitation-bib.bib
ā”‚Ā Ā  ā”œā”€ā”€ RELOCATION_FINAL_CANADA_738TOTALcitation-endnote.xml
ā”‚Ā Ā  ā”œā”€ā”€ RELOCATION_FINAL_CANADA_738TOTALcitation-ris.ris
ā”‚Ā Ā  ā”œā”€ā”€ RELOCATION_FINAL_CANADA_738TOTAL.csv
ā”‚Ā Ā  ā”œā”€ā”€ RELOCATION_FINAL_CANADA_738TOTAL-ddi.xml
ā”‚Ā Ā  ā”œā”€ā”€ RELOCATION_FINAL_CANADA_738TOTAL.RData
ā”‚Ā Ā  ā””ā”€ā”€ RELOCATION_FINAL_CANADA_738TOTAL.tab
ā”œā”€ā”€ RELOCATION_FINAL_CANADA_738TOTAL.xls
ā””ā”€ā”€ relocation user guide.pdf

2 directories, 18 files

Where:

ā”œā”€ā”€ relocation2011
ā”œā”€ā”€ RELOCATION_FINAL_CANADA_738TOTAL

Are both folders created from Bundles.

The filesec when that dataset is downloaded alludes to that structure:

  <mets:fileSec>
    <mets:fileGrp USE="metadata">
      <mets:file ID="file-1050e279-4e0c-41a7-b423-b6f1894f574a" GROUPID="Group-8110bd37-18a8-4d4b-a932-c4e8e06d08b6">
        <mets:FLocat xlink:href="RELOCATION_FINAL_CANADA_738TOTAL/RELOCATION_FINAL_CANADA_738TOTALcitation-bib.bib" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-23e9f901-ac9a-46ee-b199-dae9dcb8b386" GROUPID="Group-889e1810-d827-47a4-b1e3-84aff1f2b567">
        <mets:FLocat xlink:href="relocation2011/relocation2011-ddi.xml" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-df7930b4-86a1-4cb6-90b1-932de6ce379f" GROUPID="Group-889e1810-d827-47a4-b1e3-84aff1f2b567">
        <mets:FLocat xlink:href="relocation2011/relocation2011citation-ris.ris" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-7a8703d6-5db2-4e14-8fa7-bcbeede5c977" GROUPID="Group-8110bd37-18a8-4d4b-a932-c4e8e06d08b6">
        <mets:FLocat xlink:href="RELOCATION_FINAL_CANADA_738TOTAL/RELOCATION_FINAL_CANADA_738TOTALcitation-ris.ris" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-445cddc7-eeea-4949-9147-5a18eb896da6" GROUPID="Group-889e1810-d827-47a4-b1e3-84aff1f2b567">
        <mets:FLocat xlink:href="relocation2011/relocation2011citation-bib.bib" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-2757b97f-d09f-402a-aaef-f616a2e3f21d" GROUPID="Group-889e1810-d827-47a4-b1e3-84aff1f2b567">
        <mets:FLocat xlink:href="relocation2011/relocation2011citation-endnote.xml" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-3509e756-75ec-432b-8111-de078766a5fe" GROUPID="Group-8110bd37-18a8-4d4b-a932-c4e8e06d08b6">
        <mets:FLocat xlink:href="RELOCATION_FINAL_CANADA_738TOTAL/RELOCATION_FINAL_CANADA_738TOTALcitation-endnote.xml" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-c033c648-56f3-410f-aa28-132617102511" GROUPID="Group-c033c648-56f3-410f-aa28-132617102511">
        <mets:FLocat xlink:href="metadata/dataset.json" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-267c2a32-b5f0-43cd-854b-e7100f0bce00" GROUPID="Group-8110bd37-18a8-4d4b-a932-c4e8e06d08b6">
        <mets:FLocat xlink:href="RELOCATION_FINAL_CANADA_738TOTAL/RELOCATION_FINAL_CANADA_738TOTAL-ddi.xml" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
    </mets:fileGrp>
    <mets:fileGrp USE="original">
      <mets:file ID="file-16f24c36-fb75-4ff8-b754-930171f7cc95" GROUPID="Group-16f24c36-fb75-4ff8-b754-930171f7cc95" CHECKSUM="0b62956fe7244a59c4f2358494b71da2" CHECKSUMTYPE="MD5">
        <mets:FLocat xlink:href="RELOCATION_FINAL_CANADA_738TOTAL.xls" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-674f3073-0c4b-4a7f-b785-85915e17f763" GROUPID="Group-674f3073-0c4b-4a7f-b785-85915e17f763" CHECKSUM="ed3bc550febecd2cbd90e056223c92c3" CHECKSUMTYPE="MD5">
        <mets:FLocat xlink:href="relocation-user-guide.pdf" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-e61faa81-d28b-494d-b056-250639142b63" GROUPID="Group-e61faa81-d28b-494d-b056-250639142b63" CHECKSUM="fbfd3e8d1122106e3c0f8ee09afcd6fc" CHECKSUMTYPE="MD5">
        <mets:FLocat xlink:href="relocation2011.sps" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-ac3da7eb-66b6-40b4-b6a9-00401879ffab" GROUPID="Group-ac3da7eb-66b6-40b4-b6a9-00401879ffab" CHECKSUM="f759178d0481e04c5f8da7cab5392826" CHECKSUMTYPE="MD5">
        <mets:FLocat xlink:href="30_CFLQ_271_13-3-13_1524%5B1%5D.pdf" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-8110bd37-18a8-4d4b-a932-c4e8e06d08b6" GROUPID="Group-8110bd37-18a8-4d4b-a932-c4e8e06d08b6" CHECKSUM="235c918a8eaff0f65a8044e81a5c1ca8" CHECKSUMTYPE="MD5">
        <mets:FLocat xlink:href="RELOCATION_FINAL_CANADA_738TOTAL/RELOCATION_FINAL_CANADA_738TOTAL.csv" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-cf2542c6-9fe8-4a3c-8df9-652d31801b59" GROUPID="Group-cf2542c6-9fe8-4a3c-8df9-652d31801b59" CHECKSUM="249fe0b2a4f60446543cfa07a33187df" CHECKSUMTYPE="MD5">
        <mets:FLocat xlink:href="relocation2011.dat" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
      <mets:file ID="file-889e1810-d827-47a4-b1e3-84aff1f2b567" GROUPID="Group-889e1810-d827-47a4-b1e3-84aff1f2b567" CHECKSUM="187999778dfe955ca8856276e7fd64a0" CHECKSUMTYPE="MD5">
        <mets:FLocat xlink:href="relocation2011/relocation2011.tab" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
      </mets:file>
    </mets:fileGrp>

But at time of writing it does not persist into the transfer METS or aip METS.