glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Reorganizing HCV - Hepacivirus #1205

Open ReneRanzinger opened 5 months ago

ReneRanzinger commented 5 months ago

I did some poking around with respect to HCV, GlyGen, and the NCBI Taxonomy:

https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=63746

it is a "no rank" two levels under the species Hepacivirus hominis:

https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3052230

This too is no longer considered a species, and is under the HCV species taxid 3052230 above.

All this suggests our discussion is out of date with respect to the current NCBI taxonomy organization for HCV. I don't know when HCV species etc. got reorganized, but presumably since we last looked carefully.

ReneRanzinger commented 5 months ago

Thank you for leading us down this rabbit hole.

I had to learn about these terms first (probably should have done this years ago): https://www.youtube.com/watch?v=G2G2bWUAef0

Proteome: We have two proteomes from two different isolates (both belonging to the same species but different genotypes - 1 for japanese and 1a for H). Somebody decided that these are the reference genome for ... the genotype? ... and we imported them.

Glycans: We have glycan annotation explicitly for the two isolates (actually Nathan did not mention any explicate annotation for HCV-Japanese) and for the HCV genus.

Preethi: Can you confirm that the two proteomes are reference proteomes and that there is no reference proteome for the species Hepacivirus hominis (3052230 former 11103).

For the glycan part: I think this is even more a reason to collapse them under organism HCV and in the detailed reference table we can link them to japanese and H tax ID.

For the protein part: Not sure, yet. Lets wait for Preethi's response. I would prefer to have the species reference proteome and forget about japanese and H.

ReneRanzinger commented 5 months ago

There are 1464 proteomes (https://www.uniprot.org/proteomes?query=%28taxonomy_id%3A3052230%29) for Hepacivirus hominis (https://www.uniprot.org/taxonomy/3052230)

When HCV was added in 2020, in addition to reference proteome UP000000518(Hepatitis C virus genotype 1a), UP000008095 (HCV 1B japanese isolate) a non-reference proteome was also requested to be added. Reason: there is data for P26662 in GlyGen, so add a proteome that contains the reviewed/swissprot protein P26662.

Regards,

Preethi

ReneRanzinger commented 5 months ago

Raja: Adding two organism allows for comparison at the sequence level. For viruses where sequence homology outside their clade might not exist bringing in another one or more (eg four if we bring in Dengue ever) might be useful.

ReneRanzinger commented 5 months ago

That raises the question if this is part of GlyGen mission. You can do this on UniProt.

katewarner commented 3 months ago

@ReneRanzinger Can you remember what we decided to do about the Organism section of the Glycan detail pages for HCV-H77 glycans (e.g. https://tst.glygen.org/glycan/G02815KT#Organism)? I remember that we are going to collapse all the HCV glycans into the species - Hepacivirus hominis - but I couldn't remember if we were still going to keep details of HCV-H77 in the organism section for the glycans that have been identified in HCV-H77.

Also do we want to be able to filter/advanced search for HCV-H77 glycans?

ReneRanzinger commented 3 months ago

In the glycan organism section it should say organism HCV on the top table. If you expand "Show More..." it will still show 63746 and its name.

sujeetvkulkarni commented 3 months ago

@ReneRanzinger glygen_name (Organism) is shown in top table. name (Species), common_name (Common Name) and taxid (Tax ID) is shown when expanded. API: https://api.tst.glygen.org/glycan/detail/G02815KT/?query= {"paginated_tables":[{"table_id":"glycoprotein","offset":1,"limit":20,"sort":"uniprot_canonical_ac","order":"asc"},{"table_id":"expression_tissue","offset":1,"limit":20,"sort":"start_pos","order":"asc"},{"table_id":"expression_cell_line","offset":1,"limit":20,"sort":"start_pos","order":"asc"},{"table_id":"publication","offset":1,"limit":200,"sort":"date","order":"desc"}]}

Screenshot 2024-05-24 at 1 03 50 PM

If json to frontend mapping needs to be changed then please let me know. Else @rykahsay may need to make changes in api response.

Also, for HCV and HCV-77 are treated separately at least at search level, that also will need correction, otherwise direct search from details page will have an impact.

API: https://api.tst.glygen.org/glycan/search_init

Screenshot 2024-05-24 at 1 15 35 PM