CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

Dash to Dryad migration #482

Closed dloy closed 3 years ago

dloy commented 3 years ago

Problem

The physical content of the campus Dash collections needs to be migrated to Dryad.

Included in this migration is:

The changes to owner and collection impact the actual data saved for the files.

One Solution

To keep the data for these files consistent with the new owner and collections requires a new ingest.

To handle these ingests a manifest.txt submission would be made for each version in the dash file. The submission would be based on the Dryad owner so there would be no collision for the localid table. The submitted collection would be Dryad. The ingest content would then be consistent with the Merritt inv db. Ingest processing would do normal submission of the new object to zookeeper for inventory. The content would be processed as normal.

To generate the manifest for these ingest submissions, an existing method in store2 would have some minor changes to create a producer based manifest.txt from existing content and use the existing storage URL for inputting the files. This ingest manifest would be directly used to pull the content from production storage to build the new Dryad content.

An ingest update submission would be made for each Dash version.

At the point the content is relatively synced, then a Dryad flip process would need to take place:

This process will keep the content consistent with the inv database. It should only require a minimal change for Dryad. The new functionality for creating a working ingest manifest.txt using the already stored content is a major new functionality that will provide us a mechanism for changing object content manually - renaming file paths while creating a new object. This will provide a solution to the %EF%BF%BD error at Wasabi

dloy commented 3 years ago

An additional mrt_manifest.xml could be add becoming producer/mrt_manifest.xml. This would be used to preserve the provinance. Example:

<?xml version="1.1"?>
<objectInfo xmlns="http://uc3.cdlib.org/ontology/mrt/manifest">
  <object id="ark:/99999/fk4b285m05">
    <current>2</current>
    <fileCount>24</fileCount>
    <totalSize>20177</totalSize>
    <actualCount>17</actualCount>
    <actualSize>16129</actualSize>
    <versionCount>2</versionCount>
    <lastAddVersion>2017-12-05T10:28:33-08:00</lastAddVersion>
  </object>
  <versions>
    <version id="1">
      <manifest count="12" size="10001" created="2017-12-05T10:22:38-08:00">
        <file id="system/mrt-mom.txt">
          <digestType>SHA-256</digestType>
          <digest>257d329ba496cdebe4bbf8371b435f94ec28948cb5d234a43998f36458dac2e3</digest>
          <size>122</size>
          <creationDate>2017-12-05T10:22:35-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-mom.txt</key>
        </file>
        <file id="producer/mrt-oaidc.xml">
          <digestType>SHA-256</digestType>
          <digest>003a621cf692ed088745f848088e3af544a80915296a52e16c76644106376fcc</digest>
          <size>790</size>
          <creationDate>2017-12-05T10:22:22-08:00</creationDate>
          <mimeType>application/xml</mimeType>
          <key>ark:/99999/fk4b285m05|1|producer/mrt-oaidc.xml</key>
        </file>
        <file id="producer/mrt-dataone-manifest.txt">
          <digestType>SHA-256</digestType>
          <digest>508a7457fc88e5731d293c943c4cefbc4e997d34c6c947f525b6214e4c60d787</digest>
          <size>309</size>
          <creationDate>2017-12-05T10:22:20-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|producer/mrt-dataone-manifest.txt</key>
        </file>
        <file id="producer/mrt-embargo.txt">
          <digestType>SHA-256</digestType>
          <digest>7c633a2213319bb53b7b14e24606b48513e6232258318d97f4e42eb880ddc169</digest>
          <size>20</size>
          <creationDate>2017-12-05T10:22:23-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|producer/mrt-embargo.txt</key>
        </file>
        <file id="system/mrt-dc.xml">
          <digestType>SHA-256</digestType>
          <digest>f40dd72e54b7e93c389895de1c13922fe5ac2ac226d7159a9704ae6f19a67929</digest>
          <size>149</size>
          <creationDate>2017-12-05T10:22:31-08:00</creationDate>
          <mimeType>application/xml</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-dc.xml</key>
        </file>
        <file id="system/mrt-ingest.txt">
          <digestType>SHA-256</digestType>
          <digest>cf4191dc3328ab9f8a6f5b8fe3974313d0042fe4edfd9f83ceafd690acde50c2</digest>
          <size>1590</size>
          <creationDate>2017-12-05T10:22:33-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-ingest.txt</key>
        </file>
        <file id="system/mrt-owner.txt">
          <digestType>SHA-256</digestType>
          <digest>f6aa2758b34f0e5752358edbf8e733304a1bd223c99dd5d5acbb506ed65cff3a</digest>
          <size>19</size>
          <creationDate>2017-12-05T10:22:28-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-owner.txt</key>
        </file>
        <file id="producer/mrt-datacite.xml">
          <digestType>SHA-256</digestType>
          <digest>55146cf9bec71dc61bad08f592c6ff41567dc74058a1451c8b4c1da3f5246f65</digest>
          <size>1202</size>
          <creationDate>2017-12-05T10:22:25-08:00</creationDate>
          <mimeType>application/xml</mimeType>
          <key>ark:/99999/fk4b285m05|1|producer/mrt-datacite.xml</key>
        </file>
        <file id="system/mrt-object-map.ttl">
          <digestType>SHA-256</digestType>
          <digest>c4e10e8f3ae02c66561453b9308be0c711f9e210456a25f9bc18a2a074a311f0</digest>
          <size>3409</size>
          <creationDate>2017-12-05T10:22:36-08:00</creationDate>
          <mimeType>plain/turtle</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-object-map.ttl</key>
        </file>
        <file id="producer/stash-wrapper.xml">
          <digestType>SHA-256</digestType>
          <digest>c32a56ab89611475c8ce7a535cce00d958eb7096492e15ac1e15ce27ae18c87f</digest>
          <size>2253</size>
          <creationDate>2017-12-05T10:22:18-08:00</creationDate>
          <mimeType>application/xml</mimeType>
          <key>ark:/99999/fk4b285m05|1|producer/stash-wrapper.xml</key>
        </file>
        <file id="system/mrt-erc.txt">
          <digestType>SHA-256</digestType>
          <digest>6fd86c6c7f12b29d3d4eb26ff7fdc756cceee113c074f8e877d4293dbe83c0cc</digest>
          <size>118</size>
          <creationDate>2017-12-05T10:22:30-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-erc.txt</key>
        </file>
        <file id="system/mrt-membership.txt">
          <digestType>SHA-256</digestType>
          <digest>59eeb11f506f126194b5059e2674673c87a6f9456217b4be630bd0ab6069f8c4</digest>
          <size>20</size>
          <creationDate>2017-12-05T10:22:27-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-membership.txt</key>
        </file>
      </manifest>
    </version>
    <version id="2">
      <manifest count="12" size="10176" created="2017-12-05T10:28:33-08:00">
        <file id="producer/mrt-oaidc.xml">
          <digestType>SHA-256</digestType>
          <digest>94c5edf213256f706b7b338d33a400a727e7b0401adee8f6204470e2c6fd478d</digest>
          <size>834</size>
          <creationDate>2017-12-05T10:28:26-08:00</creationDate>
          <mimeType>application/xml</mimeType>
          <key>ark:/99999/fk4b285m05|2|producer/mrt-oaidc.xml</key>
        </file>
        <file id="system/mrt-mom.txt">
          <digestType>SHA-256</digestType>
          <digest>257d329ba496cdebe4bbf8371b435f94ec28948cb5d234a43998f36458dac2e3</digest>
          <size>122</size>
          <creationDate>2017-12-05T10:28:33-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-mom.txt</key>
        </file>
        <file id="producer/mrt-dataone-manifest.txt">
          <digestType>SHA-256</digestType>
          <digest>508a7457fc88e5731d293c943c4cefbc4e997d34c6c947f525b6214e4c60d787</digest>
          <size>309</size>
          <creationDate>2017-12-05T10:28:26-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|producer/mrt-dataone-manifest.txt</key>
        </file>
        <file id="producer/mrt-embargo.txt">
          <digestType>SHA-256</digestType>
          <digest>7c633a2213319bb53b7b14e24606b48513e6232258318d97f4e42eb880ddc169</digest>
          <size>20</size>
          <creationDate>2017-12-05T10:28:28-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|producer/mrt-embargo.txt</key>
        </file>
        <file id="system/mrt-dc.xml">
          <digestType>SHA-256</digestType>
          <digest>f40dd72e54b7e93c389895de1c13922fe5ac2ac226d7159a9704ae6f19a67929</digest>
          <size>149</size>
          <creationDate>2017-12-05T10:28:31-08:00</creationDate>
          <mimeType>application/xml</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-dc.xml</key>
        </file>
        <file id="system/mrt-ingest.txt">
          <digestType>SHA-256</digestType>
          <digest>9e1f85a41d2e9c5c616fd1ab26e23515994a6289c4f6e666860feddd85deeccc</digest>
          <size>1588</size>
          <creationDate>2017-12-05T10:28:31-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|2|system/mrt-ingest.txt</key>
        </file>
        <file id="system/mrt-owner.txt">
          <digestType>SHA-256</digestType>
          <digest>f6aa2758b34f0e5752358edbf8e733304a1bd223c99dd5d5acbb506ed65cff3a</digest>
          <size>19</size>
          <creationDate>2017-12-05T10:28:30-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-owner.txt</key>
        </file>
        <file id="producer/mrt-datacite.xml">
          <digestType>SHA-256</digestType>
          <digest>d9fc2714291120d3a6bb2159d14b7b88edc0632a1e987fdb0c969e8c8c84d830</digest>
          <size>1258</size>
          <creationDate>2017-12-05T10:28:28-08:00</creationDate>
          <mimeType>application/xml</mimeType>
          <key>ark:/99999/fk4b285m05|2|producer/mrt-datacite.xml</key>
        </file>
        <file id="system/mrt-object-map.ttl">
          <digestType>SHA-256</digestType>
          <digest>c4e10e8f3ae02c66561453b9308be0c711f9e210456a25f9bc18a2a074a311f0</digest>
          <size>3409</size>
          <creationDate>2017-12-05T10:28:33-08:00</creationDate>
          <mimeType>plain/turtle</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-object-map.ttl</key>
        </file>
        <file id="producer/stash-wrapper.xml">
          <digestType>SHA-256</digestType>
          <digest>d3f579266d3bc49f84e71a16b2a081e0710e69e02bc32229de6e5a2caa35176f</digest>
          <size>2319</size>
          <creationDate>2017-12-05T10:28:25-08:00</creationDate>
          <mimeType>application/xml</mimeType>
          <key>ark:/99999/fk4b285m05|2|producer/stash-wrapper.xml</key>
        </file>
        <file id="system/mrt-erc.txt">
          <digestType>SHA-256</digestType>
          <digest>a313d436225bb53b2f714986723157c18627f51ed65f4122d7569f2244e133ba</digest>
          <size>129</size>
          <creationDate>2017-12-05T10:28:30-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|2|system/mrt-erc.txt</key>
        </file>
        <file id="system/mrt-membership.txt">
          <digestType>SHA-256</digestType>
          <digest>59eeb11f506f126194b5059e2674673c87a6f9456217b4be630bd0ab6069f8c4</digest>
          <size>20</size>
          <creationDate>2017-12-05T10:28:30-08:00</creationDate>
          <mimeType>text/plain</mimeType>
          <key>ark:/99999/fk4b285m05|1|system/mrt-membership.txt</key>
        </file>
      </manifest>
    </version>
  </versions>
</objectInfo>
elopatin-uc3 commented 3 years ago

@dloy I've moved this onto the project board, and moved #338 to the futures board. John, Marisa, Catherine and I are on the same page when it comes to the proposed migration strategy.

elopatin-uc3 commented 3 years ago

Just to follow up from a discussion with John, Catherine and Marisa – everyone is on board with this approach, including the addition of the manifest.xml file. The team has settled on the name for this, as it cannot be catered to a specific migration: mrt_provenance.xml.

dloy commented 3 years ago

dloy 3:43 PM Eric are the the six ucop dash objects are from a UC “Pay it Forward” project back in 2016 not included in Scott's list something that should be in Dryad or should be migrated?

elopatin 3:46 PM That’s a good question. Let me check in with Daniella.

elopatin 4:03 PM Daniella’s take is that everything except for the two test objects should be migrated. In other words, everything except ark:/13030/m5qz7jtk and ark:/13030/m54z0fxr (edited)