BioComputingUP / IDP-KG

Scripts and notebooks for generating and analysing the IDP-KG.
https://biocomputingup.github.io/IDP-KG/
Apache License 2.0
0 stars 2 forks source link

Query to find proteins with high average disorder content #25

Open AlasdairGray opened 2 years ago

AlasdairGray commented 2 years ago

Which proteins have high average protein disorder content (>95%, >50% or >0%) in MobiDB, but not yet annotated manually in DisProt.

Note that this query requires a new property to be added to the MobiDB markup.

AlasdairGray commented 2 years ago

@ivanmicetic do you remember what the property is that you were adding in to make this query feasible?

We'll need to make sure that it comes through the conversion process and is stored in the IDP-KG.

ivanmicetic commented 2 years ago

Current MobiDB's markup has a property for Protein disorder content, I don't remember when it was added. Probably after the crawls. The current markup is like this:

"hasSequenceAnnotation": [
    {
      "@type": "SequenceAnnotation",
      "@id": "https://mobidb.org/O75475#prediction-disorder-content-mobidb_lite",
      "sequenceLocation": {
        "@type": "SequenceRange",
        "rangeStart": 1,
        "rangeEnd": 530
      },
      "additionalProperty": {
        "@type": "PropertyValue",
        "name": "Protein disorder content",
        "propertyID": {
          "@id": "https://disprot.org/assets/data/IDPO_v0.2.owl#IDPO:00499"
        },
        "value": 0.655
      },
      "description": "Protein disorder content predicted by MobiDB-lite"
    },
...
]