Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
21 stars 12 forks source link

feat(openchallenges): filter EDAM concepts by sections #2640

Closed tschaffter closed 2 months ago

tschaffter commented 2 months ago

Closes #2633

Description

Update the challenge service to enable the user to query EDAM concepts associated to one or more EDAM concept topics (e.g. operation, data, format, etc.).

Two solutions were considered to index concept section values in Elasticsearch:

  1. Dynamically generate the section value from the class ID.
    • E.g. http://edamontology.org/format_1478 => format
    • E.g. http://edamontology.org/data_1 => data
    • There are two outlier concepts that get their section value set to null:
      • http://www.geneontology.org/formats/oboInOwl#ObsoleteClass => null
      • http://www.w3.org/2002/07/owl#DeprecatedClass => null
  2. Generate concept section values as described above but store these values in a new column of the EDAM concept table.

I went with Solution 1 because the data model is not modified and there is no need to add (~duplicated) data to the SQL DB.

cc: @rrchai @gaiaandreoletti

Changelog

Preview

Search with the challenge service

List the EDAM concepts that belong to the "data" or "format" section.

GET {{basePath}}/edamConcepts?sections=data,format
HTTP/1.1 200
{
  "number": 0,
  "size": 100,
  "totalElements": 2221,
  "totalPages": 23,
  "hasNext": true,
  "hasPrevious": false,
  "edamConcepts": [
    {
      "id": 1567,
      "classId": "http://edamontology.org/format_1478",
      "preferredLabel": "PDBML"
    },
    {
      "id": 1557,
      "classId": "http://edamontology.org/format_1436",
      "preferredLabel": "TreeBASE format"
    },
...

Elasticsearch queries used to explore EDAM concepts during development

Check that the property section is indexes by Elasticsearch:

http://localhost:9200/openchallenges-edam-concept-000001

Check the value of section for a few documents:

http://localhost:9200/openchallenges-edam-concept-000001/_search

Find all the concepts that belong to the "data" section with ES:

http://localhost:9200/openchallenges-edam-concept-000001/_search?q=section:data
sonarcloud[bot] commented 2 months ago

Quality Gate Passed Quality Gate passed for 'openchallenges-app'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

sonarcloud[bot] commented 2 months ago

Quality Gate Passed Quality Gate passed for 'openchallenges-challenge-service'

Issues
0 New issues
3 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

sonarcloud[bot] commented 2 months ago

Quality Gate Passed Quality Gate passed for 'openchallenges-organization-service'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

sonarcloud[bot] commented 2 months ago

Quality Gate Passed Quality Gate passed for 'openchallenges-image-service'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud