Closed cjcolvar closed 2 days ago
Could be a data issue on avalon-dev rather than a wide-spread issue, but worth spot checking.
I checked avalon-staging, ijccr-staging, archivo-staging, ijccr, and archivo and the section list migration ran fine on all of them. Looking at avalon-devel it looks like over half of the items did not get migrated. I think this was either because the migration was never run or an early version of the migration which didn't skip validations. I manually ran saves (skipping validations) on all of the MediaObjects that didn't have section_list
triples (effectively running the migration again) and everything looks good. I think this points to the migration being fine and avalon-dev having bad data. I could run this check on MCO-staging or MCO but it would probably take a day and might need some changes to ensure it can handle the scale.
Here is what I ran:
ids = ActiveFedora::SolrService.instance.conn.get("select", params: {rows: 1000, q: "has_model_ssim:MediaObject", fl: ['id']})["response"]["docs"].pluck("id")
conn = ActiveFedora.fedora.connection.http
ids_missing_section_list = ids.select {|id| !conn.get(MediaObject.id_to_uri(id)).body.include? "section_list" }
@elynema @joncameron Do you think I should try running this check on MCO too?
@cjcolvar My preference is yes, but I'm curious about Jon's opinion.
Yes for me too. Seems worth the extra overhead just in case bad data is present in MCO environments.
I ran the following script on mco-staging and MCO. mco-staging didn't report any ids as Not Found or missing. MCO reported 92 with missing and none as Not Found but upon further investigation all of those items had been deleted since the section_list migration.
This is the code I ran. Note that it hasn't been fixed to deal with the misidentification of section lists missing instead of objects being not found/deleted.
#!/usr/bin/env ruby
conn = ActiveFedora.fedora.connection.http
out = File.open("log/section_list_dump.fedora.txt", "a+")
last_id = out.readlines&.last&.split(':')&.first
puts "Resuming from #{last_id}" if last_id.present?
# Read from log/section_list_dump.post_migration.txt
File.readlines("log/section_list_dump.post_migration.txt").each do |line|
id = line.split(':')&.first
next unless id.present? && id.length == 9
next if last_id.present? && id <= last_id
# Make curl request to fedora and capture section_list triple
response = conn.get(MediaObject.id_to_uri(id), {}, {"Accept" => "application/ld+json", "Prefer" => "return=minimal"})
if response.status == 404
out.puts "#{id}: Not Found"
next
end
section_list = JSON.parse(response.body).dig(0, "http://avalonmediasystem.org/rdf/vocab/media_object#section_list", 0, "@value") rescue nil
unless section_list.present?
out.puts "#{id}: Section list missing from fedora"
next
end
out.puts "#{id}: section_ids: #{section_list}"
end
out.close
puts "Completed"
Description
When looking at the Fedora 6 migrated OCFL data on disk on avalon-dev as part of #5978 I noticed that the section list triple was missing from the item I spot checked. It appears that the item had been indexed with the section list but it hadn't been persisted to fedora. But I spot checked a couple items on MCO and they both had the section list in fedora. We should quickly investigate why section lists are missing on avalon-dev and if it is a problem with the section list migration.
Done Looks Like