ebi-pf-team / interproscan

Genome-scale protein function classification
Apache License 2.0
302 stars 67 forks source link

Descriptions missing for some CDD matches #385

Closed kimrutherford closed 4 weeks ago

kimrutherford commented 1 month ago

Hi. Thanks for InterProScan. It's working very well for PomBase.

We've noticed that for some CDD matches, the the long description that's available on the InterPro web pages isn't included the InterProScan JSON output.

An example is cd15474 on the page for myo51: https://www.ebi.ac.uk/interpro/protein/reviewed/O74805/ In the InterProScan (v5.70-102.0) output for myo51 the name and description are "Myo5p-like_CBD_fungal":

          "signature": {
            "accession": "cd15474",
            "name": "Myo5p-like_CBD_fungal",
            "description": "Myo5p-like_CBD_fungal",
            "signatureLibraryRelease": {
              "library": "CDD",
              "version": "3.20"
            },
            "entry": null
          }

The InterPro website has a longer description: "cargo binding domain of fungal myosin V -like proteins"

https://www.ebi.ac.uk/interpro/protein/reviewed/O74805/ image

Would it be possible to include the long description from CDD in the JSON output?

Another example is cd00174 where the name and description in the JSON are "SH3" but the website has the long description:

https://www.ebi.ac.uk/interpro/protein/reviewed/Q09822/ image

Thanks in advance.

CC: @ValWood

matthiasblum commented 4 weeks ago

Hi @kimrutherford,

I am afraid that it's not something we can easily do for the current implementation of InterProScan. However, we are working on a new implementation, and the long descriptions will be included in the JSON/XML outputs. We hope to release a beta later this year or in January.

kimrutherford commented 4 weeks ago

Thanks for the reply @matthiasblum

We are very happy to help with beta testing of the new version when it is available.