BIH-CEI / ERKER2Phenopackets

A pipeline of ERKER data into the phenopackets data structure.
MIT License
4 stars 1 forks source link

Review new phenopacket structure #136

Closed frehburg closed 1 year ago

frehburg commented 1 year ago

Dear @ielis,

Could you please check the new structure of our phenopackets? We put disease as OntologyClass into Diagnosis instead of having Phenopacket>Disease.

You can find the code here: ERKER2Phenopackets/src/MC4R/MapMC4R.py

If there is something else we should change, please leave us a comment here.

Cheers,

Filip and Adam

frehburg commented 1 year ago

P.S.: We conducted validation using your phenopacket-tools, you may do this in our project by calling validate on the cmd as well. We are aware of the following errors and will tend to them.

2023-09-22 15:01:31 | ERROR    | ERKER2Phenopackets.src.utils.PhenopacketValidation:_validate_phenopacket:92 - BaseValidator required 'phenotypicFeatures[0].type.label' is missing but it is required
2023-09-22 15:01:31 | ERROR    | ERKER2Phenopackets.src.utils.PhenopacketValidation:_validate_phenopacket:92 - BaseValidator required 'interpretations[0].id' is missing but it is required
2023-09-22 15:01:31 | ERROR    | ERKER2Phenopackets.src.utils.PhenopacketValidation:_validate_phenopacket:92 - BaseValidator required 'interpretations[0].diagnosis.disease' is missing but it is required
2023-09-22 15:01:31 | ERROR    | ERKER2Phenopackets.src.utils.PhenopacketValidation:_validate_phenopacket:92 - MetaDataValidator Ontology Not In MetaData No ontology corresponding to ID 'NCBITaxon:9606' found in MetaData
2023-09-22 15:01:31 | ERROR    | ERKER2Phenopackets.src.utils.PhenopacketValidation:_validate_phenopacket:92 - MetaDataValidator Ontology Not In MetaData No ontology corresponding to ID 'GENO:0000135' found in MetaData
2023-09-22 15:01:31 | ERROR    | ERKER2Phenopackets.src.utils.PhenopacketValidation:_validate_phenopacket:92 - MetaDataValidator Ontology Not In MetaData No ontology corresponding to ID 'ORPHA:71529' found in MetaData
ielis commented 1 year ago

Hi @frehburg , the phenopackets in ERKER2Phenopackets/data/out/phenopackets/2023-09-22-1343 indeed have the errors that you mention above.

Missing resource for an ontology

Regarding the missing resource for certain ontologies (last 3 errors) - it should be relatively straightforward to address the error by inserting the resources into the resources list.

Missing phenotypic feature label

Regarding missing phenotypic feature label - it is unclear to me where you're getting the term IDs from. From what I can follow in the code, it is coming from row[label_col] and can be None. However, as you know, the label is a required field of OntologyClass, so you have to get it. One way to get the label is to query JAX's ontology API at ontology.jax.org. For instance, using /api/hp/terms/{id} endpoint, you can get a following JSON:

curl -X 'GET' \
  'https://ontology.jax.org/api/hp/terms/HP%3A0001250' \
  -H 'accept: application/json'

returns

{
  "id": "HP:0001250",
  "name": "Seizure",
  "definition": "A seizure is an intermittent abnormality of nervous system physiology characterised by a transient occurrence of signs and/or symptoms due to abnormal excessive or synchronous neuronal activity in the brain.",
  "comment": "A type of electrographic seizure has been proposed in neonates which does not have a clinical correlate, it is electrographic only. The term epilepsy is not used to describe recurrent febrile seizures. Epilepsy presumably reflects an abnormally reduced seizure threshold.",
  "synonyms": [
    "Epileptic seizure",
    "Seizures",
    "Epilepsy"
  ],
  ... even more content in here ...
}

where name is the term's label.

Alternatively, you can use hpo-toolkit to get the label without network access using get_term method:

import hpotk

# Choose a HPO version and stick to it in the analysis
hpo_url = 'https://github.com/obophenotype/human-phenotype-ontology/releases/download/v2023-09-01/hp.json'
hpo = hpotk.load_minimal_ontology(hpo_url)
print(f'Loaded HPO v{hpo.version}')

term = hpo.get_term('HP:0001250')
if term is not None:
  # there indeed is a term for `HP:0001250`, so we can access term's `name` property.
  print(term.name)  # prints 'Seizure'

Regarding the other issues, I think you should be able to put some meaningful values there. However, please let me know if you run into any troubles..

frehburg commented 1 year ago

Dear @ielis, Thank you for your detailed reply. I have been on a conference all of this week, that's why things have been moving slowly. I will look into your points Monday. Sounds like they should be fixable.

Thank you!

Filip