ebi-ait / dcp-ingest-central

Central point of access for the Ingestion Service of the HCA DCP
Apache License 2.0
0 stars 0 forks source link

Generic user_friendly name in generated spreadsheets #977

Open arschat opened 1 year ago

arschat commented 1 year ago

Describe the bug When I download a generated spreadsheet from DCP the user_friendly names I get for all biomaterials and protocols is a generic Biomaterial ID and Protocol ID instead of DONOR ORGANISM ID and COLLECTION PROTOCOL ID I have from the template that we use. There can be multiple columns with the same name when we have input biomaterial into one entity or multiple protocols applied to one entity.

To Reproduce

  1. Download a spreadsheet from DCP i.e. GSE67833_neural_stem_cells (works for ~100 spreadsheets I downloaded).
  2. For each tab of the graph for biomaterial or protocol, the user_friendly name is a generic Biomaterial ID or Protocol ID

Expected behaviour The generated spreadsheet would have a user_friendly name for the Biomaterial ID that specifies the type of biomaterial i.e. Donor Organism ID, Specimen from Organism ID, Cell suspension ID, Cell line ID, Organoid ID, Imaged specimen ID and the type of protocol Aggregate generation protocol, Collection protocol, Differentiation protocol, Dissociation protocol, Enrichment protocol, iPSC induction protocol, Treatment protocol, Imaging preparation protocol, Imaging Protocol, Library preparation protocol, Sequencing protocol, Analysis protocol.

Screenshots / Logs What is shown in the downloaded spreadsheet from DCP. Private Zenhub Image

What is shown in the template that we use. Private Zenhub Image

Links no links

Environment

Browser

ESapenaVentura commented 1 year ago

I investigated a bit about this, this is due to the way the flattener recollects the information from the schemas for the headers.

I am preparing a PR to solve this