PHI-base / PHI5_web_display

PHI5_web_display will allow to display PHI-Canto data
1 stars 0 forks source link

Host section is missing on host genes with no genotype annotations #55

Closed jseager7 closed 1 year ago

jseager7 commented 2 years ago

For host genes that are only involved in Physical Interaction annotations (such as P69B of Solanum lycopersicum, PHIG:267), the Host section is missing from the gene page, presumably because this section is populated from genotype and metagenotype names, but none exist for the gene.

We probably still want to include the Host section in this case, since otherwise there is no way to see the NCBI Taxonomy ID for the host organism.

Note that the strain column would have to be blank, since strains are not curated for Physical Interaction annotations.

Here's a mockup:

image

jseager7 commented 2 years ago

@martin2urban @CuzickA Just to confirm, in the Host (or Pathogen) section of the gene page, when it summarises all the strains that have been curated in PHI-base for that gene, are we planning to exclude the publication references? Meaning, should we exclude the Reference column?

I think we already decided to exclude the Reference column from the Pathogen (or Host) section that list all of the interacting genes, and there may be a lot of publications that reference one strain of a particular organism, so this column could get full of a lot of data that's not very useful.

CuzickA commented 2 years ago

Hi @jseager7 Yes, I think that we decided to remove the reference column from the entries in the pathogen or host sections.

jseager7 commented 2 years ago

Thanks. I've updated the mockup.

jashobanta-mcpl commented 2 years ago

@jseager7 : We are unable to match the Gene name in allele block while traversing . Hence the block is missing . Please suggest . JSON blocks pasted for your reference .

  "alleles": {
    "D0MVC9:babb18b437174938-3": {
      "allele_type": "wild_type",
      "gene": "Phytophthora infestans D0MVC9",
      "name": "EPI1+",
      "primary_identifier": "D0MVC9:babb18b437174938-3",
      "synonyms": []
    },
    "D0MVC9:babb18b437174938-4": {
      "allele_type": "wild_type",
      "gene": "Phytophthora infestans D0MVC9",
      "name": "EPI1+",
      "primary_identifier": "D0MVC9:babb18b437174938-4",
      "synonyms": []
    }
  }

  "genes": {
    "Phytophthora infestans D0MVC9": {
      "organism": "Phytophthora infestans",
      "uniquename": "D0MVC9"
    },
    "Solanum lycopersicum O04678": {
      "organism": "Solanum lycopersicum",
      "uniquename": "O04678"
    }
  }
jseager7 commented 2 years ago

@jashobanta-mcpl It's much easier to find the organism data for the gene if you use the new JSON export format, since every gene has an organism object containing the organism taxon ID and scientific name. For example:

"O04678": {
    "organism": {
        "full_name": "Solanum lycopersicum",
        "taxon_id": "4081"
    },
    // other properties not shown
},

However, if that's not an option, then you'll need to look elsewhere in the current export format to get the organism data. See below for an explanation:


There will never be any alelles for the gene if the gene only has annotations of the following types:

All these annotation types are annotated to unmodified versions of the gene (hence no alleles). Fortunately, annotation objects of the above types should always have a gene property that contains the gene ID of the gene.

Using the GO Molecular Function annotation on PHIG:267 as an example:

{
  "checked": "no",
  "creation_date": "2018-12-14",
  "curator": { /* not shown */ },
  "evidence_code": "IDA",
  "extension": [ /* not shown */ ],
  "gene": "Solanum lycopersicum O04678",  // this is the gene ID
  "publication": "PMID:15096512",
  "status": "new",
  "submitter_comment": "",
  "term": "GO:0008236",
  "type": "molecular_function"
},

"Solanum lycopersicum O04678" is the gene ID, which can be looked up in the genes object of the curation session to get the organism name. For example:

"genes": {
  "Solanum lycopersicum O04678": {
    "organism": "Solanum lycopersicum",
    "uniquename": "O04678"
  },
  // other genes not shown
}  

The taxon ID for the organism can then be retrieved by searching for the organism name in the organisms object for the session:

"organisms": {
  "4081": {  // the property name is the taxon ID of the organism
    "full_name": "Solanum lycopersicum"
  },
  // other organisms not shown
}

If neither of the above approaches work, then the organism name and taxon ID could always be queried directly from UniProtKB:

<organism evidence="10">
  <name type="scientific">Solanum lycopersicum</name>
  <name type="common">Tomato</name>
  <name type="synonym">Lycopersicon esculentum</name>
  <dbReference type="NCBI Taxonomy" id="4081"/>
</organism>

But querying from UniProtKB is not ideal since PHI-base may use scientific names that are different from UniProtKB: for example, Fusarium graminearum instead of Gibberella zeae.

jseager7 commented 2 years ago

Related: #56

jseager7 commented 1 year ago

The Host section is displayed for PHIG:267 now.

image

The Host strain, Host genotype, and Reference columns probably don't need to be shown in this case, but since those changes overlap with issue #56, I'll close this issue as completed.