glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Add sialic acid xref to GlyGen #843

Open kmartinez834 opened 9 months ago

kmartinez834 commented 9 months ago

NCBI have now populated independent pages for each of the sialic acids. Please see https://www.ncbi.nlm.nih.gov/glycans/sialic.html where these pagelinks can be found.

CID and SID (compound and structure ID) have been assigned to these entities and thus the information curated by the group can now find its way to pubchem partners and the semantic web. Similar work is currently underway for the SNFG monosaccharides, and Evan’s team has also initiated an SNFG reference collection at: https://pubchem.ncbi.nlm.nih.gov/source/11743

kmartinez834 commented 2 months ago

Files from the links above are now in downloads/snfg/current/ For 2.6: --> Add to files4nathan.csv so Nathan can provide an snfg xref file.
--> Create download script and update download instruction documentation.

edwardsnj commented 5 days ago

@ubhuiyan It is not clear to me that we decided to add these monosaccharides to the glycan structure list for GlyGen.

The second file, sia-table1.tsv, has its CID, SID values included in the Pubchem_substance_compound.csv file, so it appears redundant. Of the 225 CID values in Pubchem_substance_compound.csv file, 126 of them are in the GlyTouCan source set and therefore have GlyTouCan accessions associated with them. I have manually curated a similar SNFG mapping file, which may or may not agree with these values, but this is a nontrivial merge of information.

Do we want multiple SID values associated with the CID (GlyTouCan + SNFG SIDs are distinct) of the mapped CIDs?

I will add this to the agenda for Wednesday morning's meeting (7/10)

edwardsnj commented 5 days ago

@ubhuiyan It is not clear to me that we decided to add these monosaccharides to the glycan structure list for GlyGen.

The second file, sia-table1.tsv, has its CID, SID values included in the Pubchem_substance_compound.csv file, so it appears redundant. Of the 225 CID values in Pubchem_substance_compound.csv file, 126 of them are in the GlyTouCan source set and therefore have GlyTouCan accessions associated with them. I have manually curated a similar SNFG mapping file, which may or may not agree with these values, but this is a nontrivial merge of information.

Do we want multiple SID values associated with the CID (GlyTouCan + SNFG SIDs are distinct) of the mapped CIDs?

I will add this to the agenda for Wednesday morning's meeting (7/10)