ebi-pf-team / interproscan

Genome-scale protein function classification
Apache License 2.0
303 stars 67 forks source link

Change in GO term reported in XML files #309

Closed hyphaltip closed 1 year ago

hyphaltip commented 1 year ago

A previously written XML parser for earlier versions of interproscan.sh expected GO terms to be listed with category and name fields but this seems to have been compacted to just GO ID - is this the future direction of how GO terms will be kept in the XML output?

This is causing a parsing bug we can work around in nextgenusfs/funannotate#830 - previously the format (eg 5.55-88.0)

 <go-xref category="BIOLOGICAL_PROCESS" db="GO" id="GO:0055085" name="transmembrane transport"/>

but now looks like (version 5.60-92.0) -is this the new normal, and should we then need to write a GO database lookup in our scripts to facilitate producing our usually output which has GO Term and GO Category in the summary report we generate from the XML.

<go-xref db="GO" id="GO:0005737"/>

The JSON looks like this which suggests maybe you still wanted to keep including these other fields but something wasn't setup correctly in my interproscan run???

"goXrefs" : [ {
      "name" : null,
      "databaseName" : "GO",
      "category" : null,
      "id" : "GO:0005886"
    }, {
      "name" : null,
      "databaseName" : "GO",
      "category" : null,
      "id" : "GO:0110165"
    },
matthiasblum commented 1 year ago

Hi @hyphaltip,

Are the GO terms associated to an InterPro entry?

<entry ac="IPR015578" desc="Neurotrophin-3" name="Neurotrophin-3" type="FAMILY">
  <go-xref category="MOLECULAR_FUNCTION" db="GO" id="GO:0005165" name="neurotrophin receptor binding"/>
</entry>

Or to a PANTHER match?

<panther-match ac="PTHR11589:SF4" evalue="1.6E-101" graft-point="PTN001358546" name="NEUROTROPHIN-3" protein-class="PC00163" score="350.9">
  ...
  <go-xref db="GO" id="GO:0048468"/>
</panther-match>

We recently started to report GO terms associated to PANTHER matches, but at the moment, only the term ID is being reported, not the category/name.

However, for GO terms associated to InterPro entries, the category and name should always be reported.