ebi-ait / hca-ebi-dev-team

Repository for hca ebi dev team agile management. See zenhub board
0 stars 0 forks source link

Issue in dcp1 updates for project 577c946d-6de5-4b55-a854-cd3fde40bff2 #433

Closed aaclan-ebi closed 2 years ago

aaclan-ebi commented 3 years ago

slack: https://embl-ebi-ait.slack.com/archives/C9XD6L0AD/p1625058217091100

A DCP 1 Project with uuid 577c946d-6de5-4b55-a854-cd3fde40bff2 failed importing because there are no data files,

As agreed with Data Import team

Tasks:

aaclan-ebi commented 3 years ago

Looking at Andrew's original message on slack, it looks like there's another issue we need to resolve.

The exporter is populating the provenance.schema_major_version, provenance.schema_minor_version for the metadata JSON files. However, the cell suspension has an older version of the schema which doesn't contain those fields.

{
  "describedBy": "https://schema.humancellatlas.org/type/biomaterial/13.1.0/cell_suspension",
  "schema_type": "biomaterial",
  "biomaterial_core": {
    "biomaterial_id": "M_C57BL/6_pancreas_cells_batch2",
    "biomaterial_description": "Mouse islets were isolated (and pooled) from five C57BL/6 and ICR mice by perfusion of the common bile duct with 0.8 mM Collagenase P (Roche), digestion of the pancreata with 0.8 mM Collagenase P (Roche) and purification of the islets by Histopaque gradient (Sigma) centrifugation.",
    "ncbi_taxon_id": [
      10090
    ]
  },
  "genus_species": [
    {
      "text": "Mus musculus",
      "ontology": "NCBITaxon:10090",
      "ontology_label": "Mus musculus"
    }
  ],
  "selected_cell_types": [
    {
      "text": "pancreatic PP cell",
      "ontology": "CL:0002275",
      "ontology_label": "pancreatic PP cell"
    }
  ],
  "estimated_cell_count": 334,
  "provenance": {
    "document_id": "02341f59-b12e-4039-83c6-b563919f7845",
    "submission_date": "2019-07-04T13:38:38.022Z",
    "update_date": "2019-07-04T13:38:44.588Z",
    "schema_major_version": 13,
    "schema_minor_version": 1
  }
}

Options for the solution:

  1. Update the exporter to not populate these provenance fields for metadata JSON's which has an older schema version. Clean up the exported metadata to remove those fields.

  2. Update the schema version of the metadata JSON's.

cc @amnonkhen @clairerye

aaclan-ebi commented 3 years ago

I believe we need to fix the provenance issue first before asking the Data Import team to reimport this project.

ESapenaVentura commented 3 years ago

I would advocate for option 1, option 2 seems more on the migration side (Which we'll have to prioritise eventually, but it's a big task). If we are not implementing migrations fully, I think it's more sustainable for the exporter to detect when the fields do not exist

ofanobilbao commented 3 years ago

Moving this task to Stalled column after discussing with Claire on Friday. I believe there is a ticket in Dev that will address the issue. And this specific task is blocked/stalled by delivering on the dev ticket first.

ofanobilbao commented 2 years ago

ebi-ait/dcp-ingest-central#376 DCP1 project updates to terra

Wkt8 commented 2 years ago

@aaclan-ebi and @jacobwindsor is this effectively done?

jacobwindsor commented 2 years ago

Tested on staging and works well

image.png

Promoting to prod

Wkt8 commented 2 years ago

Is there an SOP for updating dcp1 projects somewhere? @jacobwindsor

jacobwindsor commented 2 years ago

There is not but give me a sec and i'll publish my script

jacobwindsor commented 2 years ago

Here you go https://github.com/ebi-ait/hca-ebi-dev-team/pull/467