Closed katewarner closed 3 weeks ago
Sorry, it looks like you have to add more fields to the misc file (it should have the same structure as glycan_classification.csv file as shown below). You will need to find out (likely from Nathan what you should put for xxx values)
"glytoucan_ac","glycan_type","glycan_type_source","glycan_type_source_id","glycan_subtype","glycan_subtype_source","glycan_subtype_source_id"
"G71142DF","O-linked","xxx","xxx","xxx","xxx","xxx"
$ cat unreviewed/glycan_classification.csv | head
"glytoucan_ac","glycan_type","glycan_type_source","glycan_type_source_id","glycan_subtype","glycan_subtype_source","glycan_subtype_source_id"
"G76100HQ","O-linked","Subsumption","G07078LK","Core 1","Subsumption","G07078LK"
"G76100HQ","O-linked","Subsumption","G07078LK","Core 8","Subsumption","G10436HD"
"G05834TJ","N-linked","GlycoMotif","GGM.001024","Core-fucosylated","GlycoMotif","GGM.001024"
"G22196LK","N-linked","GlycoMotif","GGM.001004","Biantennary","GlycoMotif","GGM.001037"
"G22196LK","N-linked","GlycoMotif","GGM.001004","Complex","GlycoMotif","GGM.001004"
"G27946NT","N-linked","GlycoMotif","GGM.001001","","",""
"G57976DO","N-linked","GlycoMotif","GGM.001004","Bisected","GlycoMotif","GGM.001023"
"G57976DO","N-linked","GlycoMotif","GGM.001004","Core-fucosylated","GlycoMotif","GGM.001024"
"G57976DO","N-linked","GlycoMotif","GGM.001004","Biantennary","GlycoMotif","GGM.001037"
@rykahsay I emailed Nathan about completing the glycan classification information for "G71142DF" (I CC'd you into the email) and this was his response.
TBH trying to guess what my infrastructure will ultimately add for that
seems likely to have issues, since you need the motif accession and
whatever I will ultimately call that type of O-glycosylation.
This seems like a bad strategy, in general.
How about just treating this like an exception to the "accession must
have a classification?" rather than "lets pretend it has a specific
classification" option? At least on the call last week I understood this
as a lets have a manual file for exceptions, rather than "dummy up the
annotation".
How would you like me to proceed as it doesn't look like Nathan can provide classification information for this glycan? Do you want me to provide the information, as above, but leave the "xxx" blank e.g.
"glytoucan_ac","glycan_type","glycan_type_source","glycan_type_source_id","glycan_subtype","glycan_subtype_source","glycan_subtype_source_id"
"G71142DF","O-linked"
or do you want the dataset structured differently. and or containing additional information such as the GlyTouCan ID, type and dataset to which the QC exception is applied? e.g.
"glytoucan_ac","glycan_type","dataset"
"G71142DF","O-linked","human_proteoform_glycosylation_sites_o_gluc.csv"
I have tried to create the dataset files (also reversed the changes in the misc file -- it should be kept as shown below, please don't change)
$ cat generated/misc/glycan_add_class.csv
"glytoucan_ac","glycan_type"
"G71142DF","O-linked"
Checked the dataset and it looks good. Thanks again
Following this ticket #1807 and our discussion on Wednesday, we determined the dataset failed QC because there is only one GlyTouCan ID in the
human_proteoform_glycosylation_sites_o_gluc.csv
dataset and it's not in the glycan_classification.csv or a base composition, so it gets flagged as "glycan_without_glytype".As you instructed, I've created a CSV file with the GlyTouCan ID producing the error and the correct classification here:
/data/projects/glygen/generated/misc/glycan_add_class.csv
We will use this file to provide you with GlyTouCan IDs and their (additional) classifications that are not in glycan_classification.csv. Please use the file in your QC script so that the
human_proteoform_glycosylation_sites_o_gluc.csv
dataset doesn't fail the global QC.The file contains the data below, please let me know if you need me to add more data or add/change the headers.
FYI @jeet-vora @ubhuiyan