glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Duplicated rows in the Human GlycoEnzOnto dataset #1759

Open katewarner opened 1 month ago

katewarner commented 1 month ago

Please check your script for processing the human_protein_glycogenes_glycoenzonto.csv dataset

Jeet noticed that there are duplicated rows in the human_protein_glycogenes_glycoenzonto.csv dataset (https://data.glygen.org/GLY_000922). e.g. Q96EU7-1, Q2PZI1-1, U3KPV4-1 etc.

image

I checked the downloaded file you use to create the dataset (/data/projects/glygen/downloads/glycogenes/current/human_glycogenes_glycoenzonto.csv) and I couldn't find any duplicated rows so I think there may be an issue with the processing script for this dataset.

rykahsay commented 3 hours ago

This is because the mongoDB containing records/rows from human_glycogenes_glycoenzonto.csv is old and was forgottedn to be updated after we fix the duplication issue. This will go away when we push 2.7.1 to prd. Please keep this ticket with you and check it when we update data.tst.glygen.org