glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Biomarker search #1221

Closed ReneRanzinger closed 6 months ago

ReneRanzinger commented 7 months ago

Implement the biomarker search API

Dependencies:

Blocker for:

rykahsay commented 6 months ago

Please try the APIs at https://api.tst.glygen.org/

ReneRanzinger commented 6 months ago

No changes in the field order according to @DaniallMasood

sujeetvkulkarni commented 6 months ago

@ReneRanzinger what about type aheads. Will all fields with text input support type aheads, in that case we will need a separate ticket for @rykahsay.

ReneRanzinger commented 6 months ago

All advanced fields execpt for the dropdown, yes.

sujeetvkulkarni commented 6 months ago

@rykahsay Can you please give me list of biomarker type aheads for below fields.

sujeetvkulkarni commented 6 months ago

Also, current biomarker search init api does not list Exposure Agent in the simple search category

API: https://api.tst.glygen.org/biomarker/search_init/

 "simple_search_category": [
    {
      "id": "any",
      "display": "Any"
    },
    {
      "id": "biomarker",
      "display": "Biomarker"
    },
    {
      "id": "condition",
      "display": "Condition"
    }
  ]

And Exposure Agent ID option needs to be supported by https://api.tst.glygen.org/biomarker/search/ API.

rykahsay commented 6 months ago

Use /typeahead/typeahead/ API with the following query examples:

{
            "field": "biomarker_id",
            "limit": 10,
            "value": "AA4686"
}

{
    "field":"biomarker",
    "limit":10,
    "value":"increased IFNG"
}

{
    "field":"condition_id",
    "limit":10,
    "value":"10283"
}

{
        "field":"condition_name",
        "limit":10,
        "value":"prostate"
}

{
    "field":"biomarker_entity_id",
    "limit":10,
    "value":"P05231"
}

{
        "field":"biomarker_entity_type",
        "limit":10,
        "value":"protein"
}

{
            "field": "publication_id",
            "limit": 10,
            "value": "3247979"
}

{
    "field":"best_biomarker_role",
    "limit":10,
    "value":"risk"
}

{
    "field":"specimen_id",
    "limit":10,
    "value":"0000178"
}

{
    "field":"specimen_name",
    "limit":10,
    "value":"blood"
}
rykahsay commented 6 months ago

On lack of exposure agent, here is what @seankim658 said:

" For the exposure agent in our data model condition and exposure agent are mutually exclusive. So a biomarker can either be related to a specific condition or exposure agent and not both. Our JSON schema only allows for one or the other. Since the Glygen data doesn't have any exposure agent related biomarkers there's nothing captured from that field. @DaniallMasood can explain any of this further if you have more questions about this, I only understand from the data model perspective and not the science reasoning behind it. "

sujeetvkulkarni commented 6 months ago

@rykahsay Thanks, can you please also give type aheads for below fields.

LOINC code Biomarker Entity

rykahsay commented 6 months ago
$ http POST :8082/typeahead/typeahead/ < tests/examples/typeahead/loinc_code.json
$ cat tests/examples/typeahead/loinc_code.json 
{
    "field":"specimen_loinc_code",
    "limit":10,
    "value":"9041"
}

"Biomarker Entity" is a field that has a complex value -- the targeted field in advanced search is "biomarker_entity_name". Given below is the fields you use in advanced search and the paths targeted.

f_map = {
        "biomarker_id":"biomarker_id",
        "biomarker_canonical_id":"biomarker_canonical_id",
        "biomarker":"biomarker_component.biomarker",
        "biomarker_entity_name":"biomarker_component.assessed_biomarker_entity.recommended_name",
        "biomarker_entity_id":"biomarker_component.assessed_biomarker_entity_id",
        "biomarker_entity_type":"biomarker_component.assessed_entity_type",
        "specimen_name":"biomarker_component.specimen.name",
        "specimen_id":"biomarker_component.specimen.id",
        "specimen_loinc_code":"biomarker_component.specimen.loinc_code",
        "best_biomarker_role":"best_biomarker_role.role",
        "condition_id":"condition.recommended_name.id",
        "condition_name":"condition.recommended_name.name",
        "publication_id":"publication.reference.id"
    }
sujeetvkulkarni commented 6 months ago

@rykahsay thanks, What is the type ahed for Biomarker Entity (biomarker_entity_name)? Also, publication_id type ahed is returning empty response for biomarker.

And biomarker/search also fails with publication_id input.

https://api.tst.glygen.org/biomarker/search?query={"publication_id":"32234467"}

rykahsay commented 6 months ago

Yes --- typeahead for Biomarker Entity is biomarker_entity_name)

Please try the search again (https://api.tst.glygen.org/biomarker/search?query={"publication_id":"32234467"})

sujeetvkulkarni commented 6 months ago

Now publication search is working.

But biomarker_entity_name and publication_id type ahed are not working.

https://api.tst.glygen.org/typeahead/typeahead?query={"field":"biomarker_entity_name","value":"i","limit":100} returns empty response ([]).

and

https://api.tst.glygen.org/typeahead/typeahead?query={"field":"publication_id","value":"3","limit":100} returns 500 INTERNAL SERVER ERROR

rykahsay commented 6 months ago

Try now:

$ http POST :8082/typeahead/typeahead/ < tests/examples/typeahead/q.1.json
HTTP/1.1 200 OK
Connection: close
Content-Length: 3227
Content-Type: application/json
Date: Thu, 02 May 2024 11:04:30 GMT
Server: gunicorn

[
    "26S proteasome non-ATPase regulatory subunit 11",
    "Acid sphingomyelinase",
    "Albumin",
    "Alkylation repair homologue 5",
    "Alpha-N-acetylgalactosaminide alpha-2,6-sialyltransferase 3",
    "Angiopoietin-2 protein",
    "Angiotensin-converting enzyme 2",
    "Anterior gradient protein 2 homolog",
    "Aquaporin-1",
    "Aspartate aminotransferase",
    "Breast cancer circulating SAA1, CRP panel",
    "C-reactive protein",
    "CD40 ligand",
    "CYP17 A2 allele polymorphism",
    "Cadherin-17",
    "Carcinoembryonic antigen",
    "Cathepsin B",
    "Chitinase-3-like protein 1",
    "Cholinesterase",
    "Circulating CD44v6 variant and STn O-glycoform panel",
    "Cystatin C",
    "Des-gamma carboxy-prothrombin",
    "Dickkopf-1",
    "Dublin-Boston score",
    "E-cadherin",
    "Earlier stage ovarian cancer IL6, WFDC2, MUC16, CDH1 panel",
    "Endothelin-1",
    "Epidermal growth factor receptor gene",
    "Epidermal growth factor receptor protein",
    "Fatty acid-binding protein, adipocyte",
    "Fibrinogen to Albumin ratio",
    "Fibroblast growth factor receptor 2 gene amplification",
    "Fractalkine",
    "Galectin-3",
    "Glutathione peroxidase 3",
    "Glyco-typer 50 N-glycan liver disease panel",
    "Glycosyltransferase 8 domain containing 1",
    "Glypican 6",
    "Granulocyte colony-stimulating factor",
    "Haptoglobin",
    "Heat shock protein beta-1",
    "Intercellular adhesion molecule 1",
    "Interferon gamma",
    "Interferon gamma+  signature panel",
    "Interleukin-1 receptor antagonist",
    "Interleukin-10",
    "Interleukin-4",
    "Interleukin-6",
    "Interleukin-8",
    "Intermediate conductance calcium-activated potassium channel protein 4",
    "KI67 antigen",
    "Kinesin family member 20A",
    "LCR to NLR ratio",
    "Leukocyte immunoglobulin like receptor B4",
    "Low-density lipoprotein receptor-related protein 1B gene mutation",
    "Lymphocyte to C-reactive protein ratio",
    "Macrophage inflammatory protein 1-alpha",
    "Macrophage inhibitory cytokine 1",
    "Matrix metalloproteinase-8",
    "Mitochondrial 2-oxoglutarate/malate carrier protein",
    "Monocyte chemoattractant protein 1",
    "Monocyte chemotactic protein-3",
    "Mucin-1",
    "Mucin-16",
    "Myofibrillogenesis regulator 1",
    "Neurofilament light chain",
    "Nidogen-2",
    "Nidogen-2 gene promoter methylation",
    "Olfactomedin-4",
    "Ovarian cancer OPN, MUC16, LEP, PRL, IGF2, MIF expression panel",
    "P-selectin",
    "PIN2/TERF1 interacting telomerase inhibitor 1",
    "Peroxiredoxin-1",
    "Proteasome subunit beta type 9",
    "Protein S100A9",
    "Protein arginine methyltransferase-1",
    "Protein quaking",
    "RING finger protein 126",
    "Receptor tyrosine-protein kinase erbB-2",
    "Selenoprotein P",
    "Serine/arginine-related protein 53",
    "Serum amyloid A protein",
    "Signal transducer and activator of transcription 3",
    "Soluble urokinase plasminogen activator receptor",
    "Squamous cell carcinoma antigen",
    "Sterile alpha motif domain-containing protein 5",
    "Sterol regulatory element-binding protein 1",
    "Stromal cell-derived factor 1",
    "Thrombomodulin",
    "Thyrotroph embryonic factor",
    "Transcription factor GATA-4",
    "Transcription factor GATA-4 gene promoter methylation",
    "Transforming growth factor-beta-induced protein",
    "Transgelin",
    "Tristetraprolin",
    "Tumor necrosis factor alpha",
    "Type 1 diabetes regulatory T cell gene panel",
    "Vascular cell adhesion molecule-1",
    "WAP four-disulfide core domain protein 2",
    "Yes-associated protein 1"
]
$ http POST :8082/typeahead/typeahead/ < tests/examples/typeahead/q.2.json
HTTP/1.1 200 OK
Connection: close
Content-Length: 1230
Content-Type: application/json
Date: Thu, 02 May 2024 11:03:47 GMT
Server: gunicorn

[
    "10.1101/2020.05.31.20118315",
    "10.33218/001c.13525",
    "10829039",
    "10914713",
    "12717392",
    "14760083",
    "15371879",
    "16033098",
    "16135921",
    "16534867",
    "18953438",
    "19362088",
    "19470939",
    "19536090",
    "20375178",
    "20376881",
    "20398667",
    "21431281",
    "21577321",
    "21673904",
    "21683503",
    "23174934",
    "23645542",
    "23986888",
    "24213139",
    "25143907",
    "25382443",
    "25553114",
    "25813380",
    "26033551",
    "26936883",
    "27439769",
    "27648369",
    "27767378",
    "28939487",
    "28953854",
    "29357836",
    "29716923",
    "29958235",
    "30100196",
    "30160019",
    "30187949",
    "30311656",
    "30386613",
    "30431376",
    "30532555",
    "30711999",
    "30994045",
    "31199813",
    "31254127",
    "31552812",
    "31559796",
    "31655611",
    "31884802",
    "31887541",
    "31898157",
    "31985880",
    "31986264",
    "32054505",
    "32162865",
    "32195055",
    "32222466",
    "32224151",
    "32254065",
    "32257537",
    "32292113",
    "32333071",
    "32347972",
    "32360286",
    "32369209",
    "32458111",
    "32467984",
    "32470148",
    "32479790",
    "32492406",
    "32504736",
    "32511562",
    "32540459",
    "32555317",
    "32561706",
    "32576222",
    "32582936",
    "32619411",
    "32655735",
    "32663515",
    "32708526",
    "32757722",
    "32871287",
    "32916258",
    "32934717",
    "32937590",
    "32989935",
    "33023150",
    "33130378",
    "33284889",
    "33737684",
    "33859753",
    "34683137",
    "34804823",
    "9581823"
]
sujeetvkulkarni commented 6 months ago

611480d2de8822f239a7e2be8a0ee6178c1e593d