frostyfan109 / tranql

A Translator Query Language
https://researchsoftwareinstitute.github.io/data-translator/apps/tranql
MIT License
0 stars 1 forks source link

Workflow 5 v3 problem #92

Open frostyfan109 opened 5 years ago

frostyfan109 commented 5 years ago

When testing the query

SELECT population_of_individual_organisms->chemical_substance->gene->biological_process_or_activity<-phenotypic_feature
  FROM "/schema"
 WHERE icees.table = 'patient'
   AND icees.year = 2010
   AND icees.cohort_features.AgeStudyStart = '0-2'
   AND icees.feature.EstResidentialDensity < 1
   AND icees.maximum_p_value = 1
   AND chemical_substance !=~ '^(SCTID.*|rxcui.*|CAS.*|SMILES.*|umlscui.*)$'

The last statement that disallows chemical substances of certain ontologies does not seem to be working. I found a node named Beclomethasone whose type is ["chemical_substance", "drug"] but its id is SMILES:C[C@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@@]3(Cl)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO.

frostyfan109 commented 5 years ago

Make sure to check out this line

values = self.jsonkit.select (f"$.knowledge_map.[*].[*].node_bindings.{name}", response)

in tranql_ast.py. It should be where the regex statement filters out things that don't match it.

frostyfan109 commented 5 years ago

Huh. For some reason there's no SMILES node anymore. However, there are still other nodes such as an SCTID node:

{"name":"MaxDailyPM2.5Exposure","type":["chemical_substance"],"id":"SCTID:278694008","reasoner":["robokop","icees"],"equivalent_identifiers":["ENVO:01000060","MESH:D052638","umlscui:C1510837","SCTID:278694008","NCIT:C29886"],"omnicorp_article_count":0}
frostyfan109 commented 5 years ago

Maybe it could be that now we turn the type attribute into an array. Could it be checking after this occurs in the merge_results method?

frostyfan109 commented 5 years ago

~Now that I look at the code, I can't find any instances where exclude_patterns and include_patterns are actually handled.~ Concept::set_nodes should be filtering these nodes.

frostyfan109 commented 5 years ago

I think the reason that nodes are being improperly filtered is because it is checking the id of the nodes, not their curies. I'm pretty this only works when setting the nodes as the results from a reasoner. ~However, this may not be the case, because it works if you do something like chemical_substance="{curie}"~ This is because the equals sign is not handled the same.

Edit: this is not the problem.

[
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0004766"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0004979"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0005405"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0008834"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0008835"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0010940"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0011805"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0012067"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0012379"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0012577"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0012607"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0012666"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0012771"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0013180"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0022742"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0004765"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0004784"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0001491"
  },
  {
    "id": "disease",
    "type": "disease",
    "curie": "MONDO:0025556"
  }
]
frostyfan109 commented 5 years ago

I believe what may be happening is that it is working, but then the next reasoner may return results with nodes whose curies are supposed to be filtered. I'm not sure if we actually want to prevent this from happening or not.

frostyfan109 commented 5 years ago

Well, this wasn't it. It seems to have been a problem with the merging of knowledge_maps However, they still aren't filtered from the final query which is a problem, they just won't be used as source nodes.

Reference: check out Prednisone (Rxcui:198148) in Workflow 5.

frostyfan109 commented 5 years ago

This should still be updated so that these nodes are filtered out.