ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

Medication ontology - graph restriction update #1331

Open arschat opened 4 days ago

arschat commented 4 days ago

Lung Tier 2 medication in the last month field should have values from the DRug ONtology as stated in the description

Please indicate the last known therapy, as drug categories from the Drug Ontology (DRON), administered to the patient within the last month prior to sample collection. If this information is not shareable due to data privacy restrictions, please indiciate "not shareable".

For that reason we updated the medical_history schema (v7.0.0) to use the dron ontology and specified the graph restriction to the children of material entity.

However, for projects like #1316 we had to add medications (Oral contraceptive progestin) that did not match any DRON term from the subgraph we've added.

For that reason we decided to check the mapping of the medications that we've previously wrangled.

  1. Pull medication data from ingest
  2. Wrangle the data to clean the values
  3. Use ZOOMA to get matches from all OLS
  4. Decide which ontology is more comprehensive for us to include

Note: Since lung is asking for DRON, I believe we should have it anyway in the graph restriction. However, we can also add another ontology in the graph and suggest the usage of one of those.

```python from hca_ingest.api.ingestapi import IngestApi query = [{ "field": "content.medical_history.medication", "operator": "REGEX", "value": ".*" }] api = IngestApi(url="https://api.ingest.archive.data.humancellatlas.org/") api.set_token(f"Bearer {}") response = api.post('https://api.ingest.archive.data.humancellatlas.org/biomaterials/query?operator=AND&size=535', json=query) med=[] for donor in response.json()['_embedded']['biomaterials']: med.append(donor['content']['medical_history']['medication']) set(med) ```
arschat commented 4 days ago

@idazucchi worked on this, generating this medication_ontology.xlsx spreadsheet

The most important tab is new mapping. I've checked 3 possible ontologies: DRON, CHEBI and NCIT I've highlighted in orange the drugs where I'm not sure of the match, and in yellow the matches that are good but don't fall in the Pharmacologic Substance branch for NCIT. Terms in purple could also be described under treatment

  1. DRON - all the best matches are imported terms from CHEBI, so after the first 30 drugs I dropped it
  2. CHEBI - the matches are good but it's structure based, so it doesn't have more broad terms like Antiretroviral Therapy
  3. NCIT seems like the best match so far, the only exceptions are some terms that are not in the Pharmacologic Substance. There are also a couple that could fall into treatment, like high-dose intravenous immunoglobulin

Before we switch ontology we should make sure that we can at least map all the terms in the sheet It's not the full list of drugs present in ingest but at least it's a varied selection