glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Evaluation of GlycoEnzOnto #74

Open ReneRanzinger opened 1 year ago

ReneRanzinger commented 1 year ago

From Sriram Neelamegham:

I'd suggested this previously to Raja and Mike, but if you want to integrate the GlycoEnzOnto we developed into Glygen we'd be happy to help.
Sriram

There is a GitHub repository for it.

I am not sure if @rajamazumder and @mtiemeyer0919 already made a decision.

@jeet-vora can you please have a look and we can disucss this in one of the Wednesday meetings.

rajamazumder commented 1 year ago

I think we were expecting a table from Sriram's PhD student

jeet-vora commented 8 months ago

@ubhuiyan Download https://github.com/neel-lab/GlycoEnzOnto/blob/main/finishedGlycogenes.xlsx from the GitHub repository

Copy the UniProt ACs from the Uniprot Field and add it to UniProt List In UniProt. Customize the table with column shown below.

image

Filter out Glycosytransferase GT (EC - 2.4.1.X) and compare it with https://data.glygen.org/ln2data/releases/data/v-2.3.1/reviewed/human_protein_glycosyltransferase.csv

Provide a list of UniProt ACs that are new and missing from the Glycogenes list.

ubhuiyan commented 8 months ago

Additional Notes: Add xref to Glycoenzoonto

ubhuiyan commented 8 months ago

Curation Steps:

1. Copy all UniProt accessions within the finishedGlycogenes dataset and paste to list search in UniprotKB. 2. Select "Swiss-Prot" for accessions that have been reviewed 3. Select "Customize Columns" and click the following: - From - Entry - Organism - Gene Name - Protein Name - EC Number - BRENDA cross reference - CAZy cross reference 4. Download the dataset as a CSV 5. Send to Jeet for checkpoint/QC 6. Compare UniProt accessions between the downloaded dataset and human_protein_glycosyltransferase to identify unique accessions within downloaded dataset. 7. Create a column to indicate the unique UniProt accessions in the downloaded dataset 8. Send to Jeet for review

ubhuiyan commented 1 month ago

This task has been completed. I emailed Jeet this comparison soon after joining the team.

ReneRanzinger commented 1 month ago

@jeet-vora is there something we need to talk about? Anything we can do or is it a dead end in terms of data integration or linking?

jeet-vora commented 1 month ago

@ubhuiyan

Can you include this item in our morning meeting to discuss. I went to vacation after you might have sent so need to review it again.

jeet-vora commented 1 month ago

@katewarner

The dataset https://data.glygen.org/GLY_000922 has not been processed correctly. There are repeating acessions with differenet genenames. The source file has 403 rows but the processed one has 403 + rows. Also some of the headers from the source file are missing. Can you investigate and create a ticket for Robel.

Source dataset : https://github.com/neel-lab/GlycoEnzOnto/blob/main/finishedGlycogenes.xlsx from the GitHub repository

Check the attached dataset for new GTs. Once verified add them to our human_protein_glycosyltransferase dataset. The new GTs should show evidence and the entry needs to be reviewed for adding to our list. I can review the terms to be added once you have evaluated. Final Glycosyltransferase.csv

katewarner commented 1 month ago

1759