ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

Duplications due to pluralization of "Selected Cell Type" #679

Open theathorn opened 2 years ago

theathorn commented 2 years ago

The "Selected Cell Type" facet (cell_suspension.selected_cell_types.ontology_label/text) in the Data Browser "Tissue Type" dropdown contains pluralization of many values, resulting in duplications:

Acceptance criteria for the task:

ESapenaVentura commented 2 years ago

Needs triaging

willrockout commented 2 years ago

@ESapenaVentura doesn't ingest check the label to the id? Why is it giving a pass with the plural version if the looked-up term isn't plural?

ESapenaVentura commented 2 years ago

@willrockout I don't think ingest checks label <--> ontology ID, but I will check with the developers. In any case, the .text is definitively not checked, because we agreed that that is a free string text where the contributor can input the metadata as they please

@theathorn I thought the browser only took the ontology label field when available? (And ignored the .text) Was it not available for these datasets?

theathorn commented 2 years ago

@ESapenaVentura Azul first checks for the cell_type_ontology.ontology_label and, if not present, falls back to the cell_type_ontology.text field. So either the ontology_labels are inconsistent or are not present.

ofanobilbao commented 1 year ago

This is not really stalled. It has simply not been prioritised just yet. Moving to the Backlog