Mapping new biomarker details api response to old data model and old GlyGen interface.

sujeetvkulkarni commented 2 weeks ago

tst api response: https://api.tst.glygen.org/biomarker/detail/AA4686-11?query={"paginated_tables":[{"table_id":"publication","offset":1,"limit":200,"sort":"date","order":"desc"}]}

Production api response: https://api.glygen.org/biomarker/detail/A0001?query={"paginated_tables":[{"table_id":"publication","offset":1,"limit":200,"sort":"date","order":"desc"}]}

Left is tst api response and right is from production. Screenshot 2024-04-30 at 1 57 28 PM

Old GlyGen interface https://www.glygen.org/biomarker/A0001

Screenshot 2024-04-30 at 2 00 09 PM

Few questions:

We have Five sections:

General New data model doesn't have something like assessed_biomarker_entity at top level of api response object.
Biomarker Description In the old data model data from instances":[] is mapped in Biomarker Description.

New data model doesn't have something like "instances":[] so do we need map data from biomarker_component:[] here? It looks like all fields are not there and structure is different.
Components In case of old data model components->protein and components->glycan is mapped to Glycan and Protein tables in Components section.

New data model doesn't have something like "components":{} so do we need map data from biomarker_component:[] here?
Cross References crossref:[] in both old and new data model.
Publications citation:[] array from new data model. publication:[] array from old data model.

seankim658 commented 2 weeks ago

Adding @DaniallMasood here, he would know better for these questions and how the information should be structured but chipping in what I can:

In the new data model, each biomarker entry represents a singular biomarker/disease connection whereas in the old model, you could have multiple diseases in the same biomarker entry. So for the top level General section, you could include the IDs, disease (condition) information, and best biomarker roles. The assessed_biomarker_entity is within the biomarker_component array because the new model is designed to more accurately capture panel/multi-component biomarkers. So assessed_biomarker_entity won't be in the general section.
The biomarker_component array represents the different potential components of a biomarker whereas the instances (by our new definition) represent separate biomarkers. If there is only one biomarker_component entry then the biomarker is a singular biomarker. If it has multiple components then it is a panel or multi-component biomarker meaning that it could be measured in a combination of assessed entity type's. Daniall would know what the (or if at all) the Biomarker Description section needs to be renamed.
I have no idea on this one, @DaniallMasood let us know.
The reason why the publications array was renamed to citations is because we are accepting different types of citations that are not necessarily publications, for example patents and FDA certifications.

seankim658 commented 2 weeks ago

Structure of the exposure agent object:

"exposure_agent": {
        "id": "",
        "recommended_name": {
            "id": "",
            "name": "",
            "description": "",
            "resource": "",
            "url": ""
        }
    },

rykahsay commented 2 weeks ago

Following our discussion yesterday, the "evidence_list" and "tags" are aggregated for each article id (see example in /biomarker/detail/AA4695-1)

sujeetvkulkarni commented 1 week ago

@DaniallMasood It looks like in "specimen": [] below there is no information like specimen name, for example - "blood", which was there in old api. In the Component table we wanted to display specimen in blood (UBERON: 0000178) format. with the current information we can only display UBERON: 0000178

API: https://api.tst.glygen.org/biomarker/detail/AA4686-11

[
  {
    "biomarker": "increased IL6 level",
    "assessed_biomarker_entity_id": "P05231-1",
    "assessed_entity_type": "protein",
    "assessed_biomarker_entity": {
      "recommended_name": "Interleukin-6",
      "synonyms": [
        {
          "synonym": "IL-6"
        },
        {
          "synonym": "B-cell stimulatory factor 2"
        },
        {
          "synonym": "BSF-2"
        },
        {
          "synonym": "CTL differentiation factor"
        },
        {
          "synonym": "CDF"
        },
        {
          "synonym": "Hybridoma growth factor"
        },
        {
          "synonym": "Interferon beta-2"
        },
        {
          "synonym": "IFN-beta-2"
        }
      ]
    },
    "specimen": [
      {
        "namespace": "UBERON",
        "id": "0000178",
        "url": "http://purl.obolibrary.org/obo/UBERON_0000178",
        "loinc_code": "26881-3"
      },
      {
        "namespace": "UBERON",
        "id": "0001977",
        "url": "http://purl.obolibrary.org/obo/UBERON_0001977",
        "loinc_code": "26881-3"
      }
    ],
    "evidence_source": [
      {
        "id": "GLY_000625",
        "database": "GlyGen",
        "url": "https://data.glygen.org/GLY_000625"
      },
      {
        "id": "32479790",
        "database": "PubMed",
        "url": "https://glygen.org/publication/PubMed/32479790",
        "tags": [
          "biomarker",
          " assessed_biomarker_entity",
          " assessed_biomarker_entity_id",
          " assessed_entity_type"
        ],
        "evidence_list": [
          {
            "evidence": "IL-6 plays multifaceted roles in regulation of vascular leakage, complement activation, and coagulation pathways, which ultimately causes poor outcomes for acute respiratory distress syndrome, multiple organ dysfunction syndrome, and SARS."
          }
        ]
      },
      {
        "id": "32369209",
        "database": "PubMed",
        "url": "https://glygen.org/publication/PubMed/32369209",
        "tags": [
          "biomarker",
          " assessed_biomarker_entity",
          " assessed_biomarker_entity_id",
          " assessed_entity_type"
        ],
        "evidence_list": [
          {
            "evidence": "Low lymphocytes, increased IL-6, CRP, PCT, D dimer, and LDH, these finds were similar to previous studies. The increase of these inflammatory indexes indicates that the infected patients were in inflammatory state, which may be closely related to the inflammatory storm. The increase of the cancer biomarkers in patients with COVID-19, especially in severe and critical patients, suggests that inflammation is closely related to the development of COVID-19."
          }
        ]
      }
    ]
  }
]

sujeetvkulkarni commented 1 week ago

assigning it to @rykahsay too so that he can add specimen name in "specimen": [] objects.

rykahsay commented 1 week ago

Fixed, please check

sujeetvkulkarni commented 5 days ago

done.

glygener / glygen-issues

Mapping new biomarker details api response to old data model and old GlyGen interface. #1282