geobtaa / geoblacklight_admin

MIT License
4 stars 2 forks source link

Rake - solr:reindex #102

Open ewlarson opened 1 week ago

ewlarson commented 1 week ago

Migrate/backport this rake task from GEOMG. But testing on production pgdump seeing this error:

rake aborted!9 Document:18E16E87-3A44-40F4-B9C7-4879F48E3C9F: |======================================================================                 | 30.83/s 86602/106455 81%  ETA: 00:10:44
NoMethodError: undefined method `dct_references_uri_key' for an instance of Kithe::Asset (NoMethodError)
ewlarson commented 1 week ago

Adding a Rails-ish reindexing task with a rescue to try and capture whatever is amiss here.

ewlarson commented 1 week ago

Seeing just 1 document error...

Processed 1000 documents in this batch, total processed: 82000
Processed 1000 documents in this batch, total processed: 83000
Processed 1000 documents in this batch, total processed: 84000
Processed 1000 documents in this batch, total processed: 85000
Processed 1000 documents in this batch, total processed: 86000
Error updating index for document: 0745f15d-b3e9-4a3d-aee7-4dfc47ff2a6e
undefined method `dct_references_uri_key' for an instance of Kithe::Asset
Processed 1000 documents in this batch, total processed: 87000
Processed 1000 documents in this batch, total processed: 88000
Processed 1000 documents in this batch, total processed: 89000
Processed 1000 documents in this batch, total processed: 90000
Processed 1000 documents in this batch, total processed: 91000
Processed 1000 documents in this batch, total processed: 92000
Processed 1000 documents in this batch, total processed: 93000
Processed 1000 documents in this batch, total processed: 94000

From rails console...

irb(main):004> d = Document.find_by_friendlier_id("0745f15d-b3e9-4a3d-aee7-4dfc47ff2a6e")
  Document Load (2.2ms)  SELECT "kithe_models".* FROM "kithe_models" WHERE "kithe_models"."type" = $1 AND "kithe_models"."friendlier_id" = $2 LIMIT $3  [["type", "Document"], ["friendlier_id", "0745f15d-b3e9-4a3d-aee7-4dfc47ff2a6e"], ["LIMIT", 1]]
=> 
#<Document:0x000000013afb21c0
...
irb(main):005> d
=> 
#<Document:0x000000013afb21c0
 id: "d06ba0b9-53e6-4c3e-a5ad-518c0d01f558",
 title: "Moral statistics [France] {1833}",
 type: "Document",
 position: nil,
 json_attributes: "[FILTERED]",
 created_at: Thu, 29 Feb 2024 08:44:17.000000000 CST -06:00,
 updated_at: Fri, 01 Mar 2024 17:18:40.720658000 CST -06:00,
 parent_id: nil,
 friendlier_id: "0745f15d-b3e9-4a3d-aee7-4dfc47ff2a6e",
 file_data: nil,
 representative_id: "461ee342-dcf9-432e-b977-0f7dcce15085",
 leaf_representative_id: "461ee342-dcf9-432e-b977-0f7dcce15085",
 kithe_model_type: "work",
 import_id: 112,
 publication_state: "published",
 dct_title_s: "Moral statistics [France] {1833}",
 dct_alternative_sm: ["Guerry"],
 dct_description_sm: ["Moral statistics of France (Guerry, 1833)"],
 dct_language_sm: ["eng"],
 gbl_displayNote_sm: [],
 dct_creator_sm: [],
 dct_publisher_sm: [],
 schema_provider_s: "GeoDa Data and Lab",
 gbl_resourceClass_sm: ["Datasets"],
 gbl_resourceType_sm: [],
 dct_subject_sm: [],
 dcat_theme_sm: [],
 dcat_keyword_sm: [],
 dct_temporal_sm: ["1833"],
 dct_issued_s: "",
 gbl_indexYear_im: [1833],
 gbl_dateRange_drsim: ["1833-1833"],
 dct_spatial_sm: ["France"],
 locn_geometry: "POLYGON((-5.45 51.31, 9.83 51.31, 9.83 41.26, -5.45 41.26, -5.45 51.31))",
 dcat_bbox: "-5.45,41.26,9.83,51.31",
 dcat_centroid: "46.285,2.19",
 gbl_georeferenced_b: nil,
 dct_relation_sm: [],
 pcdm_memberOf_sm: ["b0153110-e455-4ced-9114-9b13250a7093"],
 dct_isPartOf_sm: ["12d-05"],
 dct_source_sm: [],
 dct_isVersionOf_sm: [],
 dct_replaces_sm: [],
 dct_isReplacedBy_sm: [],
 dct_rights_sm: [],
 dct_rightsHolder_sm: [],
 dct_license_sm: [],
 dct_accessRights_s: "Public",
 dct_format_s: "Shapefile",
 gbl_fileSize_s: "",
 b1g_creatorID_sm: [],
 b1g_geonames_sm: [],
 gbl_wxsIdentifier_s: "",
 geomg_id_s: "0745f15d-b3e9-4a3d-aee7-4dfc47ff2a6e",
 dct_identifier_sm: [],
 gbl_suppressed_b: nil,
 date_created_dtsi: Thu, 29 Feb 2024 08:44:17.000000000 CST -06:00,
 date_modified_dtsi: nil,
 b1g_language_sm: [],
 b1g_image_ss: "",
 b1g_code_s: "12d-05",
 b1g_dct_accrualMethod_s: "Manual",
 b1g_dct_accrualPeriodicity_s: "",
 b1g_dateAccessioned_sm: ["2024-02-29"],
 b1g_dateRetired_s: "",
 b1g_status_s: "",
 b1g_publication_state_s: "published",
 b1g_child_record_b: nil,
 b1g_dct_mediator_sm: [],
 b1g_access_s: "",
 dct_references_s:
  [#<Document::Reference:0x000000013dd99b60
    @attributes={"value"=>"https://geo.btaa.org/uploads/asset/461ee342-dcf9-432e-b977-0f7dcce15085/d7fed7dd22c9dbcba0fd8a296c79ae02.html", "category"=>"documentation_download"}>,
   #<Document::Reference:0x000000013dd999a8 @attributes={"value"=>"https://geodacenter.github.io/data-and-lab/data/guerry.zip", "category"=>"download"}>,
irb(main):006> d.save

Document#references > seeded: {"http://lccn.loc.gov/sh85035852"=>["https://geo.btaa.org/uploads/asset/461ee342-dcf9-432e-b977-0f7dcce15085/d7fed7dd22c9dbcba0fd8a296c79ae02.html"], "http://schema.org/downloadUrl"=>["https://geodacenter.github.io/data-and-lab/data/guerry.zip"], "http://schema.org/url"=>["https://geodacenter.github.io/data-and-lab/Guerry/"]}
Document#dct_downloads > init: ["https://geodacenter.github.io/data-and-lab/data/guerry.zip"]

Document#multiple_downloads > aardvark: [{:label=>"Original Shapefile", :url=>"https://geodacenter.github.io/data-and-lab/data/guerry.zip"}]

  TRANSACTION (0.4ms)  BEGIN
  DocumentDownload Load (10.7ms)  SELECT "document_downloads".* FROM "document_downloads" WHERE "document_downloads"."friendlier_id" = $1  [["friendlier_id", "0745f15d-b3e9-4a3d-aee7-4dfc47ff2a6e"]]
Document#dct_downloads > document_downloads: [{:label=>"Original Shapefile", :url=>"https://geodacenter.github.io/data-and-lab/data/guerry.zip"}]

  Kithe::Asset Load (0.7ms)  SELECT "kithe_models"."id", "kithe_models"."title", "kithe_models"."type", "kithe_models"."position", "kithe_models"."json_attributes", "kithe_models"."created_at", "kithe_models"."updated_at", "kithe_models"."parent_id", "kithe_models"."friendlier_id", "kithe_models"."file_data", "kithe_models"."kithe_model_type", "kithe_models"."import_id", "kithe_models"."publication_state" FROM "kithe_models" WHERE "kithe_models"."type" IN ($1, $2) AND "kithe_models"."parent_id" = $3  [["type", "Kithe::Asset"], ["type", "Asset"], ["parent_id", "d06ba0b9-53e6-4c3e-a5ad-518c0d01f558"]]
  Kithe::Model Load (0.8ms)  SELECT "kithe_models".* FROM "kithe_models" WHERE "kithe_models"."id" = $1  [["id", "d06ba0b9-53e6-4c3e-a5ad-518c0d01f558"]]
  TRANSACTION (0.2ms)  ROLLBACK
/Users/ewlarson/.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/activemodel-7.0.8.6/lib/active_model/attribute_methods.rb:450:in `method_missing': undefined method `dct_references_uri_key' for an instance of Kithe::Asset (NoMethodError)
irb(main):007> d.dct_references_s
=> 
[#<Document::Reference:0x000000013dd99b60
  @attributes={"value"=>"https://geo.btaa.org/uploads/asset/461ee342-dcf9-432e-b977-0f7dcce15085/d7fed7dd22c9dbcba0fd8a296c79ae02.html", "category"=>"documentation_download"}>,
 #<Document::Reference:0x000000013dd999a8 @attributes={"value"=>"https://geodacenter.github.io/data-and-lab/data/guerry.zip", "category"=>"download"}>,
 #<Document::Reference:0x000000013dd997f0 @attributes={"value"=>"https://geodacenter.github.io/data-and-lab/Guerry/", "category"=>"documentation_external"}>]
irb(main):008> d.assets
/Users/ewlarson/.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/activemodel-7.0.8.6/lib/active_model/attribute_methods.rb:450:in `method_missing': undefined method `assets' for an instance of Document (NoMethodError)
Did you mean?  asset!
               asset?
irb(main):009> d.document_assets
  Kithe::Asset Load (1.1ms)  SELECT "kithe_models"."id", "kithe_models"."title", "kithe_models"."type", "kithe_models"."position", "kithe_models"."json_attributes", "kithe_models"."created_at", "kithe_models"."updated_at", "kithe_models"."parent_id", "kithe_models"."friendlier_id", "kithe_models"."file_data", "kithe_models"."kithe_model_type", "kithe_models"."import_id", "kithe_models"."publication_state" FROM "kithe_models" WHERE "kithe_models"."type" IN ($1, $2) AND "kithe_models"."parent_id" = $3  [["type", "Kithe::Asset"], ["type", "Asset"], ["parent_id", "d06ba0b9-53e6-4c3e-a5ad-518c0d01f558"]]
  Kithe::Model Load (0.5ms)  SELECT "kithe_models".* FROM "kithe_models" WHERE "kithe_models"."id" = $1  [["id", "d06ba0b9-53e6-4c3e-a5ad-518c0d01f558"]]
=> 
[#<Kithe::Asset:0x000000013c698498
  id: "461ee342-dcf9-432e-b977-0f7dcce15085",
  title: "Guerry_documentation.html",
  type: "Kithe::Asset",
  position: 1,
  json_attributes: nil,
  created_at: Fri, 01 Mar 2024 13:56:27.378069000 CST -06:00,
  updated_at: Fri, 01 Mar 2024 13:56:27.480131000 CST -06:00,
  parent_id: "d06ba0b9-53e6-4c3e-a5ad-518c0d01f558",
  friendlier_id: "hd9mhb9ky",
  file_data:
   {"id"=>"asset/461ee342-dcf9-432e-b977-0f7dcce15085/d7fed7dd22c9dbcba0fd8a296c79ae02.html",
    "storage"=>"store",
    "metadata"=>{"size"=>17577, "width"=>nil, "height"=>nil, "filename"=>"Guerry_documentation.html", "mime_type"=>"text/html"}},
  kithe_model_type: "asset",
  import_id: nil,
  publication_state: "draft">]
ewlarson commented 1 week ago

Okay... so turns out we had one unexpected model type in the database — perhaps from before our DocumentAssets work was fully baked.

{"count"=>35534, "type"=>"Asset"}
{"count"=>106455, "type"=>"Document"}
{"count"=>1, "type"=>"Kithe::Asset"}

In our database would should only have Documents and Assets. The Kithe::Asset is technically the super class of our Assets model.

Removing the Kithe::Asset from the database resolves this issue.