Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
21 stars 12 forks source link

feat(openchallenges): enable users to search challenges using EDAM operation ID and preferred label #2555

Closed tschaffter closed 3 months ago

tschaffter commented 4 months ago

Closes #2550

Changelog

Preview

Search by EDAM concept ID

http://localhost:8000/challenge?searchTerms=http://edamontology.org/operation_3207

Screen Shot 2024-03-07 at 11 32 05

Search by EDAM concept ID - ES tokenizes on slash by default 🙏

I knew ES was tokenizing on blank space by default but not that it also tokenizes on slash. I wanted to allow users to search with only the last part of the class ID (e.g. "operation_3207") so it's nice that this feature is already implemented.

http://localhost:8000/challenge?searchTerms=operation_3207

Screen Shot 2024-03-07 at 11 23 24

BTW, this is how we can visualize how ES tokenize the property operation.class_id:

http://localhost:9200/openchallenges-challenge-000001/_termvectors/4?fields=operation.class_id

{
  "_index": "openchallenges-challenge-000001",
  "_type": "_doc",
  "_id": "4",
  "_version": 1,
  "found": true,
  "took": 1,
  "term_vectors": {
    "operation.class_id": {
      "field_statistics": {
        "sum_doc_freq": 12,
        "doc_count": 4,
        "sum_ttf": 12
      },
      "terms": {
        "edamontology.org": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 1,
              "start_offset": 7,
              "end_offset": 23
            }
          ]
        },
        "http": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 4
            }
          ]
        },
        "operation_3207": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 2,
              "start_offset": 24,
              "end_offset": 38
            }
          ]
        }
      }
    }
  }
}

Search by EDAM concept preferred name

http://localhost:8000/challenge?searchTerms=Gene%20methylation%20analysis

Screen Shot 2024-03-07 at 11 10 14

An example of challenge with a non-null operation value

http://localhost:9200/openchallenges-challenge-000001/_search?q=(name:%22Drug%20Sensitivity%20and%20Drug%20Synergy%20Prediction%22)

{
  "took": 33,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 20.742569,
    "hits": [
      {
        "_index": "openchallenges-challenge-000001",
        "_type": "_doc",
        "_id": "4",
        "_score": 20.742569,
        "_source": {
          "contributions": [
            {
              "organization_id": 1,
              "role": "sponsor"
            },
            {
              "organization_id": 52,
              "role": "data_contributor"
            },
            {
              "organization_id": 131,
              "role": "sponsor"
            },
            {
              "organization_id": 150,
              "role": "data_contributor"
            }
          ],
          "created_at": "2023-11-01T22:08:36.000000000Z",
          "description": "Development of new cancer therapeutics currently requires a long and protracted process of experimentation and testing. Human cancer cell lines represent a good model to help identify associations between molecular subtypes, pathways, and drug response. In recent years there have been several efforts to generate genomic profiles of collections of cell lines and to determine their response to panels of candidate therapeutic compounds. These data provide the basis for the development of in silico models of sensitivity based either on the unperturbed genetic potential of a cancer cell, or by using perturbation data to incorporate knowledge of actual cell response. Making predictions from either of these data profiles will be beneficial in identifying single and combinatorial chemotherapeutic response in patients. To that end, the present challenge seeks computational methods, derived from the molecular profiling of cell lines both in a static state and in response to perturbation of ...",
          "doi": "",
          "end_date": "2012-10-01",
          "headline": "Predicting drug sensitivity in human cell lines",
          "input_data_types": {
            "name": "metabolomic",
            "slug": "metabolomic"
          },
          "name": "Drug Sensitivity and Drug Synergy Prediction",
          "operation": {
            "class_id": "http://edamontology.org/operation_3207",
            "preferred_label": "Gene methylation analysis"
          },
          "platform": {
            "name": "Synapse",
            "slug": "synapse"
          },
          "starred_count": 0,
          "start_date": "2012-06-01",
          "status": "completed",
          "submission_types": {
            "name": "prediction_file"
          },
          "_entity_type": "ChallengeEntity"
        }
      }
    ]
  }
}

Indexed EDAM concept in ES

http://localhost:9200/openchallenges-edam-concept-000001/_search?q=*:*&size=2

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3472,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "openchallenges-edam-concept-000001",
        "_type": "_doc",
        "_id": "51",
        "_score": 1,
        "_source": {
          "class_id": "http://edamontology.org/data_0885",
          "preferred_label": "Structure database search results",
          "_entity_type": "EdamOperationEntity"
        }
      },
      {
        "_index": "openchallenges-edam-concept-000001",
        "_type": "_doc",
        "_id": "60",
        "_score": 1,
        "_source": {
          "class_id": "http://edamontology.org/data_0894",
          "preferred_label": "Amino acid annotation",
          "_entity_type": "EdamOperationEntity"
        }
      }
    ]
  }
}
sonarcloud[bot] commented 4 months ago

Quality Gate Passed Quality Gate passed for 'openchallenges-challenge-service'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud