Closed jseager7 closed 10 months ago
Thanks @jseager7 . Please let us know the steps for implement it with current JSON format . We will implement it in new export format later .
@jashobanta-mcpl See below for a description of how to populate the Pathogen and Host sections with the current export format.
When the organism of the gene page is a pathogen, then the Pathogen section should contain all the strains linked to the genotypes that contain the page's pathogen gene. When the organism of the gene page is a host, then the Host section should contain all the strains linked to the genotypes that contain the page's host gene.
The strains can be found by searching all the genotypes in the export and checking whether or not they contain the page's gene. Alternatively, the strains of the genotypes already linked to the gene page could be searched.
page_gene
to the UniProtKB accession number of the gene page (e.g. "Q00909" for PHIG:253)session
in curation_sessions
:
genotype
in session.genotypes
:locus
in genotype.loci
:
locus_allele
in locus
:strain_name
to the value of the organism_strain
property of the genotype
objectallele_id
to the value of the id
property of the locus_allele
objectallele_id
in the property names of the alleles
object (which is in the session
object)gene_id
to the value of the gene
property of the matching allele objectgene_id
in the property names of the genes
object (which is in the session
object)uniprot_id
to the value of the uniquename
property of the matching gene object (this is the UniProtKB accession number)uniprot_id
equals page_gene
:
strain_name
in the "Experimental strain" column of the Pathogen sectionstrain_name
in the "Host strain" column of the Host sectionFor example, when the organism of the gene page is a pathogen, then the Host section should contain all the host genes that interact with the pathogen gene. For pathogen genes, the "interacting genes" are any host genes in a metagenotype with the pathogen gene, or any host genes in a physical interaction with the pathogen gene (vice versa for host genes).
The interacting genes can be found by searching all annotations, filtering for metagenotype annotations and physical interactions, checking whether the interaction contains the pathogen (or host) gene, then extracting all the host (or pathogen) genes from the interaction.
The interacting genes in the Pathogen or Host gene section should be listed by primary gene name, but since the gene name is not included in the JSON export (only the allele name is included), the primary gene name must be retrieved from UniProtKB, or queried from the PHI-base 5 database if the gene name is already stored there.
The first step is to get the interacting genes for the page's gene:
interacting_genes
to an empty listpage_gene
to the UniProtKB accession number of the gene page (e.g. "Q00909" for PHIG:253)session
in curation_sessions
:
annotation
in session.annotations
:annotation
object has a "metagenotype" property:
metagenotype.pathogen_genotype
in the property names of the genotypes
object (which is in the session
object)pathogen_genotype
to the matching pathogen genotypepathogen_genotype
(loci → alleles → genes) and check if any of the genes match the page_gene
(based on the UniProtKB ID)interacting_genotype
to metagenotype.host_genotype
metagenotype.host_genotype
in the property names of the genotypes
object (which is in the session
object)host_genotype
to the matching host genotypehost_genotype
(loci → alleles → genes) and check if any of the genes match the page_gene
(based on the UniProtKB ID)interacting_genotype
to metagenotype.pathogen_genotype
locus
in interacting_genotype.loci
:locus_allele
in locus
:
allele_id
to the value of the id
property of the locus_allele
objectallele_id
in the property names of the alleles
object (which is in the session
object)gene_id
to the value of the gene
property of the matching allele objectgene_id
in the property names of the genes
object (which is in the session
object)uniquename
property of the matching gene object to the interacting_genes
listannotation.type
equals "physical_interaction":
gene_id
to annotation.gene
gene_id
in the property names of the genes
objectuniprot_id
to the value of the uniquename
property of the matching gene objectuniprot_id
equals page_gene
:
uniprot_id
to the interacting_genes
listgene_id
to the first item of annotation.interacting_genes
gene_id
in the property names of the genes
objectuniprot_id
to the value of the uniquename
property of the matching gene objectuniprot_id
equals page_gene
:uniprot_id
to the interacting_genes
listThen the list of interacting genes must be filtered to remove duplicates, and the primary gene names must be retrieved and displayed:
unique_interacting_genes
to a unique list of values from interacting_genes
(drop any duplicates)uniprot_id
in unique_interacting_genes
:
uniprot_id
in UniProtKB and get the primary gene name for the accession, or get the gene name from the PHI-base 5 databaseThe primary gene name of the gene can be retrieved from the XML format of the UniProtKB accession:
<gene>
<name type="primary">TRI5</name> <!-- primary name -->
<name type="ORF">FGRRES_03537</name>
<name type="ORF">FGSG_03537</name>
</gene>
If there is no primary name, then an ORF name should be displayed. If there are no gene names, then the UniProtKB accession number should be displayed.
Also, to find out whether a gene is a pathogen or host gene, you have to search each metagenotype in the export to check whether the gene is contained in the pathogen_genotype
or the host_genotype
.
You have to dereference the identifiers in the following sequence to get back to the gene:
pathogen_genotype
and host_genotype
dereference to a genotype in the genotypes
object,id
property of each locus allele object in a genotype dereferences to an allele in the alleles
object,gene
property of an allele object dereferences to a gene in the genes
object.Once you have found the gene in one metagenotype, you can stop searching and continue to the next gene.
I should really add the pathogen or host status of the gene to the new export format to make it easier to query.
@jseager7 , We are bit confused with the data capture flow for multiple PHIGID populating in Pathogen and Host block . Requesting for a flow with actual JSON snippet for traversing for Q00909 and any other host gene . Thanks .
@jashobanta-mcpl I just realised I made a mistake in my original comment, where I stated that the gene in the example was TRI5 of Fusarium graminearum (UniProtKB:Q00909, PHIG:253). The gene is actually RALF of Fusarium graminearum (UniProtKB:A0A0E0SJI5, PHIG:278).
This might be the cause of the confusion, since there are no host genes involved in any interaction with Q00909 (meaning only wild type hosts are involved). In this case, the Host section will presumably be empty and not displayed, but I will have to confirm this with the PHI-base team. It may be that we still want to display the host species names, but without a link to a corresponding PHIG ID.
I'll soon provide instructions (with JSON examples) for the case where both a pathogen gene and a host gene are involved.
@jashobanta-mcpl Please see below for more instructions on how to populate the Pathogen and Host sections.
The pathogen gene examples will use the Tox1 gene from Parastagonospora nodorum (UniProtKB:A9JX75), since that is the only example I can find of a pathogen gene with multiple strains that is also involved in interactions with host genes. Note that for some reason, Tox1 does not appear on the PHI-base 5 website, despite being provided in the latest JSON export.
To populate the Pathogen section for Tox1, we must find all the genotypes that contain Tox1, so that we can find all the strains for Tox1.
We will start by finding all gene objects that contain the UniProtKB accession number for Tox1 (A9JX75) in the uniquename
property. For example:
"genes": {
"Parastagonospora nodorum A9JX75": {
"organism": "Parastagonospora nodorum",
"uniquename": "A9JX75" // matches UniProtKB accession number
},
}
Then we can find the allele objects by looking up the gene ID (the key of the gene object) in the alleles
collection of the session object. Shown below is one example:
"A9JX75:bd02fdb6831712ca-36": {
"allele_type": "wild_type",
"gene": "Parastagonospora nodorum A9JX75", // matches gene ID
"name": "Tox1+",
"primary_identifier": "A9JX75:bd02fdb6831712ca-36",
"synonyms": []
},
(Note that there should be 7 allele objects in total containing Tox1.)
Next, we can look up each matching allele ID (the key of the matching allele object, or the primary_identifier
property) in the genotypes
collection of the session object. One allele ID may match more than one genotype, as in the example below:
"bd02fdb6831712ca-genotype-22": {
"loci": [
[
{
"expression": "Wild type product level",
"id": "A9JX75:bd02fdb6831712ca-36" // matches allele ID
}
]
],
"organism_strain": "Sn2000",
"organism_taxonid": 13684
},
"bd02fdb6831712ca-genotype-10": {
"comment": "A9JX75_PHANO expr level unknown",
"loci": [
[
{
"expression": "Wild type product level",
"id": "A9JX75:bd02fdb6831712ca-36" // matches allele ID
}
]
],
"organism_strain": "SN15",
"organism_taxonid": 13684
},
(Note that there should be 10 genotype objects in total containing Tox1.)
Finally, the strain names must be extracted from the organism_strain
property and displayed in the 'Experimental strain' column of the Pathogen section. The 'Pathogen ID' column can also be populated with values from the organism_taxonid
property. The final list of pathogen strains would be as follows:
Here's what this would look like in the UI:
(Note that the rowspan
on the table rows is optional, but recommended.)
For Tox1, the Host section will be populated with all the host genes that are involved in an interaction with the Tox1 gene.
The search process starts with finding all pathogen genotypes that reference Tox1, as shown in the instructions above.
From here, we look up each pathogen genotype ID (the key of the matching genotype object) in the metagenotypes
collection of the session object. Specifically, we match on the pathogen_genotype
property.
"metagenotypes": {
"bd02fdb6831712ca-metagenotype-1": {
"host_genotype": "bd02fdb6831712ca-genotype-3",
"pathogen_genotype": "bd02fdb6831712ca-genotype-10", // matches pathogen genotype ID
"type": "pathogen-host"
},
}
(There should be 22 metagenotype objects in total containing Tox1.)
We then extract the host genotype IDs from the host_genotype
property of each metagenotype, and look up the host genotype IDs in the genotypes
collection of the session object.
"bd02fdb6831712ca-genotype-3": { // matches host genotype ID
"comment": "SnTox1-sensitive",
"loci": [
[
{
"expression": "Wild type product level",
"id": "W5AB81:bd02fdb6831712ca-14"
}
]
],
"organism_strain": "cv. Chinese Spring",
"organism_taxonid": 4565
},
The id
property in the loci
array of the host genotype contains an allele identifier for the host gene. We can look this up in the alleles
collection:
"W5AB81:bd02fdb6831712ca-14": { // matches host allele ID
"allele_type": "wild_type",
"gene": "Triticum aestivum W5AB81",
"name": "Snn1+",
"primary_identifier": "W5AB81:bd02fdb6831712ca-14",
"synonyms": []
},
The gene
property in the allele object contains the gene identifier for the host gene. We can look this up in the genes
collection:
"Triticum aestivum W5AB81": { // matches host gene ID
"organism": "Triticum aestivum",
"uniquename": "W5AB81"
}
Finally, the UniProtKB accession number can be retrieved from the uniquename
property, and this can be used to map to the PHIG ID for the host gene.
In the case of Tox1, there is only one host gene involved: Snn1, which has the UniProtKB accession number W5AB81. Here's what this would look like in the UI:
(Note that PHIG:339 is a placeholder, since Snn1 doesn't have any PHIG ID assigned yet.)
Other pathogen genes are involved in interactions with multiple host genes, such as RALF of Fusarium graminearum (PHIG:278), which interacts with the following host genes:
Here's what this would look like in the UI:
For the host gene pages, the process is effectively the same as the pathogen gene pages, the only difference being:
host_genotype
property of each metagenotype object, then finding all the pathogen genes referenced by the pathogen_genotype
property of that metagenotype object.I can provide a full example of the Pathogen and Host sections for a host gene, if required.
@jashobanta-mcpl In the last meeting we decided on some further requirements for the Pathogen and Host sections:
Pathogen genes involved in interactions with wild type hosts (that is, host genotypes with no alleles) should have these host species listed in the Host section of the pathogen gene page.
Genes that interact through Physical Interaction annotations should also be included in the Host section (for pathogen genes) or the Pathogen section (for host genes).
See below for instructions.
This logic only applies to pathogen gene pages. During the process of looking up host genotypes in metagenotypes (i.e. the metagenotypes that also contain the pathogen gene), you may find host genotypes that have no alleles.
Here's an example of a metagenotype that involves the TRI5 gene (PHIG:253) of Fusarium graminearum:
"metagenotypes": {
"d7b3170ded99924f-metagenotype-1": {
"host_genotype": "Triticum-aestivum-wild-type-genotypeBobwhite", // wild type host genotype
"pathogen_genotype": "d7b3170ded99924f-genotype-1", // pathogen genotype containing TRI5
"type": "pathogen-host"
}
}
Here is what the wild type host genotype looks like:
"genotypes": {
"Triticum-aestivum-wild-type-genotypeBobwhite": {
"loci": [],
"organism_strain": "cv. Bobwhite",
"organism_taxonid": 4565
}
}
Note that the loci
array is empty, indicating that there are no alleles.
In these cases, the Host section on the gene page will contain the host species name and the NCBI Taxonomy ID, but the PHIG ID column will be left blank and the Host gene column will have a placeholder of "(no genes)".
The host taxon ID can be retrieved from the organssm_taxonid
property of the genotype object. The species name can be retrieved by looking up the taxon ID in the organisms
object of the curation session, and getting the full_name
property:
"organisms": {
"4565": {
"full_name": "Triticum aestivum"
}
}
Physical interaction annotations are not metagenotype annotations, so the logic for extracting genes from these interactions is different.
Using EPI1 (PHIG:268) of Phytophthora infestans as an example, the first step is to find all Physical Interaction annotations that contain the UniProtKB accession number for EPI1, which is D0MVC9.
Here is an example annotation:
{
"checked": "no",
"creation_date": "2019-10-16",
"curator": {
"community_curated": false
},
"evidence_code": "Affinity Capture-Western",
"figure": "Figure 5",
"gene": "Phytophthora infestans D0MVC9", // pathogen gene is EPI1
"interacting_genes": [
"Solanum lycopersicum O04678" // host gene is P69B
],
"publication": "PMID:15096512",
"status": "new",
"submitter_comment": "",
"type": "physical_interaction"
}
Physical Interaction annotations can be identified by the type
property having a value of `physical_interaction.
The pathogen gene ID can be contained in either the gene
property, or as the first item in the interacting_genes
array.
gene
property.genes
object of the curation session, and get the matching gene object.uniquname
property of the gene
object matches the UniProtKB accession number:
interacting_genes
array.interacting_genes
array.genes
object of the curation session, and get the matching gene object.uniquname
property of the gene
object matches the UniProtKB accession number:
gene
property of the annotation object.Once the host gene ID is found (and confirmed to be a valid, following the checks below) then the following steps can be used to get the information for the Host section:
genes
object of the curation session, and get the matching gene object.uniquename
property of the gene object. This can be used to get the host gene name (from UniProtKB) and the PHIG ID for the host gene.organism
property of the gene object.organisms
object in the curation session for an object with a full_name
property that matches the host species name. The taxon ID will be the key of the object.There are two complications with Physical Interaction annotations that must be handled:
Physical Interaction annotations can occur within the same species, so we first need to confirm that the gene ID does not belong to the same species as the gene of the gene page. To do this, we merely need to check that the species name contained in the gene ID is different to the species name of the gene of the gene page.
Physical Interaction annotations can be between two pathogens or two hosts (instead of one pathogen and one host), so an additional check is needed to confirm that the interacting organism is of a different role to the organism of the gene page. See the next section for instructions.
Shown below is an example of a Physical Interaction annotation between two pathogens. This type of Physical Interaction should be ignored when populating the list of pathogen genes in the Pathogen section.
{
"checked": "yes",
"creation_date": "2019-08-12",
"curator": {
"community_curated": false
},
"evidence_code": "Two-hybrid",
"figure": "Figure 6C",
"gene": "Saccharomyces cerevisiae P22007", // first pathogen
"interacting_genes": [
"Magnaporthe oryzae L7JC49"
],
"publication": "PMID:31250536",
"status": "new",
"submitter_comment": "RAM1 interacts with RAS1", // second pathogen
"type": "physical_interaction"
}
In the current export format, the only way to confirm whether a gene belongs to a pathogen is to check whether a genotype containing the gene has been annotated as a pathogen_phenotype
annotation, or whether a genotype containing the gene is in the pathogen_genotype
property of a metagenotype
.
With the example above, there are no metagenotypes, so we can only use pathogen_phenotype
annotations to decide whether the gene is a pathogen gene. The process is as follows.
First, find an allele containing the gene ID "Saccharomyces cerevisiae P22007":
"P22007:ab02789a62331ecf-3": {
"allele_type": "other",
"description": "transformant",
"gene": "Saccharomyces cerevisiae P22007", // gene ID matches
"name": "pYES2-MoRAM1+",
"primary_identifier": "P22007:ab02789a62331ecf-3",
"synonyms": []
}
Next, find a genotype containing the allele ID for this allele:
"ab02789a62331ecf-genotype-7": {
"comment": "complementation MoRAM1+ complements ScRAM1-",
"loci": [
[
{
"id": "P22007:ab02789a62331ecf-1"
}
],
[
{
"expression": "Overexpression",
"id": "P22007:ab02789a62331ecf-3" // allele ID matches
}
]
],
"organism_strain": "Unknown strain",
"organism_taxonid": 4932
}
Next, find a pathogen phenotype annotation that references this genotype ID:
{
"checked": "no",
"conditions": [
"PECO:0000102",
"PECO:0005224",
"PECO:0005247",
"PECO:0000004",
"PECO:0005269"
],
"creation_date": "2019-08-12",
"curator": {
"community_curated": false
},
"evidence_code": "Cell growth assay",
"extension": [],
"figure": "Figure 6a",
"genotype": "ab02789a62331ecf-genotype-7", // genotype ID matches
"publication": "PMID:31250536",
"status": "new",
"submitter_comment": "...",
"term": "PHIPO:0000405",
"type": "pathogen_phenotype" // annotation type matches
}
The same process can be repeated for the interacting gene ID, "Magnaporthe oryzae L7JC49", confirming that both the primary gene and the interacting gene are pathogen genes:
{
"checked": "no",
"conditions": [],
"creation_date": "2019-08-17",
"curator": {
"community_curated": false
},
"evidence_code": "Western blot assay",
"extension": [
{
"rangeDisplayName": "L7JC49_MAGOP",
"rangeType": "Gene",
"rangeValue": "L7JC49",
"relation": "assayed_using"
}
],
"figure": "Figure 6d",
"genotype": "ab02789a62331ecf-genotype-8", // genotype contains Magnaporthe oryzae L7JC49
"publication": "PMID:31250536",
"status": "new",
"submitter_comment": "...",
"term": "PHIPO:0001027",
"type": "pathogen_phenotype" // annotation type matches
}
Since the process described above is very convoluted, and not even guaranteed to work all the time, a much simpler solution would be to extend the PHI-Canto JSON export with an additional property in the organism
objects, stating whether an organism is a pathogen or host in each curation session. Here's a mockup:
"organisms": {
"318829": {
"full_name": "Magnaporthe oryzae",
"role": "pathogen"
},
"4513": {
"full_name": "Hordeum vulgare",
"role": "host"
},
"4530": {
"full_name": "Oryza sativa",
"role": "host"
},
"4932": {
"full_name": "Saccharomyces cerevisiae",
"role": "pathogen"
}
Alternatively, the list of host and pathogen species that are stored on the PHI-base/data repository could be used to classify the species as pathogen or host.
Please let me know which solution would be the easiest for you.
Implemented . Indexing is in progress .
In the last meeting we decided on the following additional requirements:
If the pathogen gene is part of a metagenotype with a wild type host, and also part of a metagenotype with a specified host gene, then the Host section should show both of these cases. The metagenotype with a specified host gene should not override the metagenotype with a wild type host. Specifically, there should be:
one row for the wild type host, where the 'Host gene' column has the text "(no genes)"; and
one row for the host genotype, where the 'Host gene' column has the host gene name (or UniProtKB accession number).
Genes from Physical Interaction annotations should be included in the Pathogen and Host sections.
Specifically, this means that for a pathogen gene, the interacting host gene from a Physical Interaction should be shown in the Host section. For a host gene, the interacting pathogen gene from a Physical Interaction should be shown in the Pathogen section.
Note that Physical Interaction annotations can be between a pathogen and a pathogen or a host and a host: in this cases, the interacting gene should not be included in the Pathogen or Host sections.
It seems like requirement 1) is already implemented in PHIG:278, but this might need further checking.
For requirement 2), it seems that same-role physical interactions are being excluded as expected, but there are some cases where the Pathogen or Host section is not being populated with genes from Physical Interaction annotations.
PHIG:297 is an example of an the problem: the Physical Interaction section lists interactions with the RAM1 gene of S. cerevisiae (PHIG:300), but this gene is not included in the Host section of the gene page (in fact, the Host section is not shown at all).
We expected to see the RAM1 gene in the Host section, as in the following mockup:
Hi @jseager7, I've just been looking at the curation session for the Physical interaction annotation above.
I believe there is a curation error. The Physical interaction annotations should be between pathogen proteins within the same pathogen species. So there should be no host. Magnaporthe oryzae RAM1 interacting with Magnaporthe oryzae RAS1 Magnaporthe oryzae RAM1 interacting with Magnaporthe oryzae RAS2
I shall make the changes from ScRAM1 to MoRAM1 in the curation session.
Note to self: I think this curation error was made because because both MoRAM1 and ScRAM1 were reported on in figure 6 and the PHI-Canto user interface shows both genes being called the same name 'RAM1'.
Linking ticket to https://github.com/PHI-base/curation/issues/33
@CuzickA Thanks for clarifying this, but even if the annotation is a curation error, the fact remains that the webpage doesn't seem to be displaying this case correctly.
We can still use this incorrect annotation to verify that the logic used to display the host genes in the Host section is correct, since I think the incorrect annotation might be the only example of this case that we have at the moment.
We can resolve the curation error when PHI-base 5 loads the next JSON export.
@jseager7 : In PHIG:297 . Both are of Pathogen Genes . Hence Host block is missing.
{ "checked": "yes", "creation_date": "2019-08-12", "curator": { "community_curated": false }, "evidence_code": "Affinity Capture-Western", "figure": "Figure 6B", "gene": "Saccharomyces cerevisiae P22007", ---ab02789a62331ecf-genotype-7 "interacting_genes": [ "Magnaporthe oryzae L7JGN0" --- --- ab02789a62331ecf-genotype-8 ], "publication": "PMID:31250536", "status": "new", "submitter_comment": "RAM1 interacts with RAS2", "type": "physical_interaction" },
Please confirm the implementation .
@jashobanta-mcpl Sorry, that's my mistake. PHI-Canto is classifying both of these species as pathogens, and the interaction described in the publication is not a pathogen-host interaction.
So, in this case, there is indeed no Host block to display.
(Just to note, the gene page for PHIG:297 is still missing a Pathogen block though, which should be displayed.)
@jseager7 : It's not picked because there are two logics for checking pathogen.
in PHIG:297 option 1 in not applicable due to metagenotype bock missing for that gennotype id .
Option 2 implementation needs to be added in parsing code . We will implement it . Please confirm both logic .
In the last meeting we decided it would be simpler to include the pathogen or host role for each species in the JSON export, so I'll work on adding that.
@jashobanta-mcpl
Here's my feedback on the display of the Pathogen and Host sections.
The following text uses PHIG:278 as an example of a pathogen gene.
The Pathogen section on a pathogen gene page is not displayed correctly. The section should display a list of strains for the pathogen gene. It should not have the columns 'Pathogen gene' or 'PHI ID'.
Compare this to the mockup in the original comment:
As a reminder, the pathogen strains need to be collected from the metagenotypes shown on the gene page.
The Host section on a pathogen gene page now looks as expected, the only problem being that PHIG IDs are not hyperlinked to their respective gene pages:
The mockup in the original comment had these IDs hyperlinked to the gene page. In the example above, there should be a link to the gene page for PHIG:281.
The following text uses PHIG:311 as an example of a pathogen gene.
The Pathogen section on the host gene page displays correctly, the only problem being the lack of hyperlinks on the PHIG IDs.
The text "PHIG:312" should be hyperlinked to the gene page for PHIG:312.
Unfortunately, the Host section still isn't displayed correctly on host gene pages, since the host strains are not shown.
The mockup in the original comment had all the host strains for the Cf-4A gene, but these are not shown in the current UI.
I've also noticed that the metagenotype details pop-up is missing the strain and species for the host:
Maybe this is related to the data indexing issue affecting issue #69.
It's fixed .
Hyperlink to PHIG ID is enabled.
@jashobanta-mcpl Thanks, the Pathogen section looks correct on all the host gene pages that I checked.
However, the Host section is still wrong on other host gene pages.
For example, PHIG:350 still has the 'Host gene' and 'PHI ID' columns, when it should have the 'Host strain' column. It also has a pathogen gene from Parastagonospora nodorum included in the Host section.
The image below shows I expected to see for PHIG:350.
Some more examples of this problem are PHIG:276, PHIG:292, and PHIG:342.
PHIG:292 is a difficult case since there are no annotations (and therefore no strains). It may be better in this case to show a placeholder for no strains in the Host section:
@jseager7 : Looks like cache issue for PHIG:350 and others. Please check once .
Clearing the cache fixed the issue for all of the above gene pages.
The only change that is now needed is to ensure that host gene pages with no annotations display '(no strains)' in the Host strain column of the Host section.
@jashobanta-mcpl Just as a reminder, we still need a "(no strains)" placeholder in the Host section for host gene pages that have no annotations.
Currently, PHIG:292 appears like this:
which is almost correct, but the "(no strains)" placeholder should be shown in the Host strain column.
The no strains placeholder is fixed for PHIG:292 now, though it only appeared after a cache reload.
(Follow-up from #51)
The PHI-base team has recently reviewed the Pathogen and Host sections of the gene page and identified a number of problems. We've decided to clarify the requirements for these sections.
For pathogen gene pages:
For host gene pages:
We also decided that we shouldn't show the Reference column in either the Pathogen or Host section, because the reference is included with the annotations in other tables, and the reference is not likely to work well when data is being aggregated like this.
Pathogen gene page
Below is a mockup of how the Pathogen and Host sections should appear for a pathogen gene, specifically RALF of Fusarium graminearum (FGRAMPH1_01T16205; PHIG:278).
Note that UniProtKB has no names for the genes in the image above, and we don't export the gene names we have recorded (FER1) in the PHI-Canto JSON export independently of the allele names. So for now, we'll probably just have to display the UniProtKB accession number in the gene column when there is no gene name in UniProt.
Host gene page
Below is a mockup of how the Pathogen and Host sections should appear for a host gene, specifically Cf-4A of Solanum lycopersicum (PHIG:311).
Note that in this case there are two strains listed in the Host section, because the Cf-4A gene has been annotated as part of two strains. The current interface only displays "cv. Moneymaker", which is incorrect. The pathogen gene also has a name in this case because the name exists in UniProtKB.
The mockup above shows row grouping in the Pathogen section so that the host name and taxon ID is not repeated every row: this would be nice to have, but is not absolutely required.
@Molecular-Connections Since the logic to extract the correct data from the export could be quite difficult, I could include summary lists of strains and interacting genes for each gene in the new JSON export format, so for these sections you would only have to display data that is already in the export.
Alternatively, I could provide instructions (pseudocode) for how to extract the data from the current export format.
Please let me know what you'd prefer.