NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Ranking and Chemical Categories (UI) Makes it Appear the Number of Results Returned for "what drugs may treat Lung Cancer" is Lower than Expected #455

Open TranslatorIssueCreator opened 1 year ago

TranslatorIssueCreator commented 1 year ago

Hi! I was looking at drugs that treat lung cancer and I got only 7 FDA approved drugs - that doesn’t sound right! Can you please look into this and get back to me? My email is sramal3@emory.edu. Thank you!

Type: Other Comment

URL: http://transltr-bma-ui-dev.ncats.io/

ARS PK: e8a19b5a-ab85-4baa-abdc-f08f2d0cd7c6

Steps to reproduce:

Screenshots:

Failed to upload screenshot.

jh111 commented 1 year ago

If helpful for troubleshooting here are some examples of expected answers: https://www.cancer.gov/about-cancer/treatment/drugs/lung.

Genomewide commented 1 year ago

I am not sure why this happened. Here is some information I have gathered.

I thought that maybe it was too high up on the hierarchy and therefore maybe had low information content. I ran a parent (lung neoplasm) and child (lung carcinoma) and Lung cancer through the node norm to get the information content and they were all pretty much the same for information content.

The apparent difference was in which ARA responded.

lung cancer: https://ui.ci.transltr.io/results?l=Lung%20Cancer&i=MONDO:0008903&t=0&q=c5c73cee-a23b-451f-9a7a-f72bab195551

lung neoplasm: https://ui.ci.transltr.io/results?l=Lung%20Neoplasm&i=MONDO:0021117&t=0&q=fde67312-18a7-4400-bbb9-f8fd46658665

lung carcinoma: https://ui.ci.transltr.io/results?l=Lung%20Carcinoma&i=MONDO:0005138&t=0&q=329b16c2-1677-45c8-a5b0-0d865b3c8c48

Not sure why this would be.

From mondo ontology: Screenshot 2023-08-10 at 5 15 49 PM

Node normal data: { "MONDO:0008903": { "id": { "identifier": "MONDO:0008903", "label": "lung cancer" }, "equivalent_identifiers": [ { "identifier": "MONDO:0008903", "label": "lung cancer" }, { "identifier": "DOID:1324", "label": "lung cancer" }, { "identifier": "OMIM:211980" }, { "identifier": "UMLS:C0024624", "label": "Malignant neoplasm of upper lobe, bronchus or lung" }, { "identifier": "UMLS:C0153491", "label": "Malignant neoplasm of middle lobe, bronchus or lung" }, { "identifier": "UMLS:C0153492", "label": "Malignant neoplasm of lower lobe, bronchus or lung" }, { "identifier": "UMLS:C0153493", "label": "Malignant neoplasm of other parts of bronchus or lung" }, { "identifier": "UMLS:C0242379", "label": "Malignant neoplasm of lung" }, { "identifier": "UMLS:C1968897", "label": "LUNG CANCER, PROTECTION AGAINST" }, { "identifier": "MEDDRA:10007096" }, { "identifier": "MEDDRA:10025044" }, { "identifier": "MEDDRA:10025056" }, { "identifier": "MEDDRA:10058467" }, { "identifier": "NCIT:C7377", "label": "Malignant Lung Neoplasm" }, { "identifier": "SNOMEDCT:269464000" }, { "identifier": "SNOMEDCT:363358000" }, { "identifier": "ICD10:C34.1" }, { "identifier": "ICD10:C34.2" }, { "identifier": "ICD10:C34.3" }, { "identifier": "ICD9:162.3" }, { "identifier": "ICD9:162.4" }, { "identifier": "ICD9:162.5" }, { "identifier": "ICD9:162.8" } ], "type": [ "biolink:Disease", "biolink:DiseaseOrPhenotypicFeature", "biolink:BiologicalEntity", "biolink:NamedThing", "biolink:Entity", "biolink:ThingWithTaxon" ], "information_content": 59.1 }, "MONDO:0021117": { "id": { "identifier": "MONDO:0021117", "label": "lung neoplasm" }, "equivalent_identifiers": [ { "identifier": "MONDO:0021117", "label": "lung neoplasm" }, { "identifier": "UMLS:C0024121", "label": "Lung Neoplasms" }, { "identifier": "MESH:D008175", "label": "Lung Neoplasms" }, { "identifier": "MEDDRA:10049790" }, { "identifier": "MEDDRA:10062042" }, { "identifier": "NCIT:C3200", "label": "Lung Neoplasm" }, { "identifier": "SNOMEDCT:126713003" }, { "identifier": "HP:0100526", "label": "Neoplasm of the lung" } ], "type": [ "biolink:Disease", "biolink:DiseaseOrPhenotypicFeature", "biolink:BiologicalEntity", "biolink:NamedThing", "biolink:Entity", "biolink:ThingWithTaxon" ], "information_content": 58.4 }, "MONDO:0005138": { "id": { "identifier": "MONDO:0005138", "label": "lung carcinoma" }, "equivalent_identifiers": [ { "identifier": "MONDO:0005138", "label": "lung carcinoma" }, { "identifier": "DOID:3905", "label": "lung carcinoma" }, { "identifier": "EFO:0001071", "label": "lung carcinoma" }, { "identifier": "UMLS:C0684249", "label": "Carcinoma of lung" }, { "identifier": "MEDDRA:10007420" }, { "identifier": "MEDDRA:10007433" }, { "identifier": "MEDDRA:10025064" }, { "identifier": "MEDDRA:10037344" }, { "identifier": "NCIT:C4878", "label": "Lung Carcinoma" } ], "type": [ "biolink:Disease", "biolink:DiseaseOrPhenotypicFeature", "biolink:BiologicalEntity", "biolink:NamedThing", "biolink:Entity", "biolink:ThingWithTaxon" ], "information_content": 59.8 } }

Tagging Will to see what the difference is here.

sierra-moxon commented 9 months ago

from TAQA:

sstemann commented 3 months ago

This could be the case of missing the FDA Approval annotations, as I'm seeing in Test now, or it could be because of the scoring. Between ARAX and BTE many of the drugs referenced on Jenn's shared site are in their top results, but so are some odd ones which are interspersed in the top and may also be contributing to not showing so many. I think we should retest with after something happens to improve the FDA Approval annotations in Test.

image

By ARA: ARAX - of the top 10 scored ARAX results, 9 are on the page from Jenn, they are all ranked 13+ by Sugeno ARAGORN - of the top 10 scored ARGAON results, 1 is on the from Jenn, six of its top 10 are in Sugeno rank top 10 BTE - of the top 10 scored BTEresults, 10 are on the page from Jenn, only 2 are in the sugeno rank top 10, the others are ranked 15+ Improve - of the top 10 scored Improve results, none are on the page from Jenn, one has a Sugeno rank of 6 (seguno score of 1) Unsecret - of the top 10 scored Unsecret results, none are on the page from Jenn, one has a Sugeno rank of 6 (seguno score of 1)

so i believe the expected treatments are being returned, but the scoring is burying them. it's not clear why the following are getting a confidence score of 1 and therefore being boosted.

these 8 are in the top Sugeno Rank and have confidence 1, Sugeno 1 betacarotene CHEBI:17579 retinol CHEBI:12777 HYDROXYCAMPTOTHECIN CHEBI:81395 BATABULIN SODIUM PUBCHEM.COMPOUND:23669770 mitomycin C CHEBI:27504 Vitamin E CHEBI:18145 radon atom CHEBI:33314 SELENIOUS ACID CHEBI:26642

Test URL: https://ui.test.transltr.io/main/results?l=Lung%20Cancer&i=MONDO:0008903&t=0&r=0&q=8bc0a760-6402-4069-bdae-9896f04ee180 Test PK: 8bc0a760-6402-4069-bdae-9896f04ee180 lung cancer_test_2024_8-7_19_12_8bc0a760-6402-4069-bdae-9896f04ee180.xlsx

Prod URL: https://ui.transltr.io/main/results?l=Lung%20Cancer&i=MONDO:0008903&t=0&r=0&q=87bab695-79fe-46d1-a17f-18b34e48777c Prod PK: 87bab695-79fe-46d1-a17f-18b34e48777c

sierra-moxon commented 2 months ago

from TAQA:

sierra-moxon commented 2 months ago

from TAQA: @sharatisrani - could you take a look at this one (in particular the ranking issues) and clarify if it is something that might be fixed before the end of this phase?

sharatisrani commented 2 months ago

@sierra-moxon et al - indeed, much will change in scoring within a few weeks. Confidence scoring will be dramatically changed so there are not a lot of 1's. The ranks of today, which frankly are not real because ties don't have the same rank, will get much more meaningful. Beyond that, novelty will have several more factors, maybe one or two will be in Guppy (tbd by @Rosinaweber today). FDA approved drugs will be low novelty, so if the novelty toggle is turned off, those drugs will rise right to the top.