avalonmediasystem / avalon

Avalon Media System – Samvera Application
http://www.avalonmediasystem.org/
Apache License 2.0
93 stars 51 forks source link

Create process for migrating remaining media objects's section lists #5856

Closed cjcolvar closed 2 months ago

cjcolvar commented 3 months ago

Description

5778 implements an in-place lazy migration of media object's ordered_aggregation modelling to a new property, section_list. This migration is done in memory and persists when an object is saved. This will migrate some of a repository's items but there will be many that won't get edited and thus won't get migrated within a set period of time. At some point we'll want to ensure that the migration has been run on all media objects. This will require a process to force the migration on all remaining un-migrated media objects. The number of media objects that need to be migrated could be significant and may take a large amount of time to process (e.g. days) for large repositories. Thus this process should be able to be run in a sidekiq background job or as a rake task run through nohup.

Prior work: #5749, #5778, #5849, #5850

Done Looks Like

elynema commented 3 months ago

@cjcolvar Looks like there is a rake task for this. Any suggestions on how we can effectively test this?

cjcolvar commented 3 months ago

I ran the rake task on avalon-dev and will again on avalon-staging when we deploy there. So maybe we can just mark this ticket as done?

elynema commented 3 months ago

@cjcolvar I would like to test this somewhat intentionally as part of running it on avalon-staging. For transcript migration, this looked like counting transcripts before and after the migration. Can we do something similar here?

cjcolvar commented 3 months ago

Yes, I could do a report of objects (as they appear in fedora) before and after the migration. Maybe something like: <media_object_id>: { ordered_master_file_ids: [], section_ids: [] }. Before migration there should be values in ordered_master_file_ids and none in section_ids. After migration there should be values in both and they should match.

Here's the one-liner to print that out:

MediaObject.all.to_a.each {|mo| puts "#{mo.id}: ordered_master_file_ids: #{mo.try(:ordered_master_file_ids) || []} section_ids: #{mo.try(:section_ids) || []}" }
joncameron commented 2 months ago

I ran Chris's one-liner: post_migration.txt It looks like there are just a handful where the ids don't match, for example at the end of the listing:

3b5918592: ordered_master_file_ids: [] section_ids: ["8g84mm259"]
sj139195q: ordered_master_file_ids: [] section_ids: ["jh343s28d", "mk61rg94h"]
zg64tk959: ordered_master_file_ids: [] section_ids: []
47429916m: ordered_master_file_ids: [] section_ids: ["kd17cs85f", "t148fh15c"]
q237hr97c: ordered_master_file_ids: [] section_ids: ["6q182k15b"]

@cjcolvar is this due to them not having ordered_masterfile_ids populated in the first place?

cjcolvar commented 2 months ago

Sorry, I forgot to run this before running the migration so we don't have anything to compare against. :disappointed:

@joncameron I think you're right, those are probably new objects after the merge of 7.8 RC 1 so they never got anything populated in their ordered_master_files.

elynema commented 2 months ago

the migration script now should do some numerical comparison we can run on mco-staging

cjcolvar commented 2 months ago

To be clear, the migration script doesn't do any comparison but I have added commands to the upgrade instructions that can dump out the information before and after migration for comparison: https://samvera.atlassian.net/wiki/spaces/AVALON/pages/2580086785/Upgrading+Avalon+7.7+to+Avalon+7.8#MediaObject-Section-List-Migration-(optional)

joncameron commented 2 months ago

Calling this done then, especially since Chris has added info to the docs for other migrators.

elynema commented 2 months ago

@cjcolvar We're still waiting for the migration to finish on mco-staging to double-check the numbers, correct?