glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Update human_protein_biomarkers_cancer.csv dataset (new name human_protein_biomarkers.csv) #12

Closed kmartinez834 closed 1 year ago

kmartinez834 commented 1 year ago

Biomarker DB (aka OncoMX) source file now includes protein and glycan entries. See below to update existing human_protein_biomarkers_cancer.csv dataset (change file name to human_protein_biomarkers.csv)

Input file: /data/projects/glygen/downloads/biomarkerdb/current/allbiomarkers-all.csv Output file: reviewed/human_protein_biomarkers.csv

1. Input file column names have changed

2. Add new column "assessed_entity_type" to output file from source "Assessed entity type"

3. Extract only entries where "Main x-ref" starts with UPKB:

4. Extract PMID from "Literature evidence" field Input Output
The blood count results showed anaemia in 21 (75%) patients, leucopaenia in 9 (32.1%) patients, and lymphopaenia in 23 (82.1%) patients. Patients developed severe clinical events; 6 (21.4%) patients were admitted to ICU, 10 (35.7%) patients had life-threatening complications, and 8 (28.6%) of the patients died. [PMID:32224151] Post-COVID-19 infection, lower hemoglobin levels, higher total white blood cell (WBC) counts, and higher absolute neutrophil counts were associated with increased mortality (Table 3). Analysis of other serologic biomarkers demonstrated that elevated D-dimer, lactate, and lactate dehydrogenase (LDH) in patients were significantly correlated with dying (Table 3). [PMID:32357994] 32224151, 32357994

Note: There may be more than one PMID for each entry

5. All other processing steps same as last update

6. Create citations file: citations_human_protein_biomarkers.csv

kmartinez834 commented 1 year ago

@jeet-vora

rykahsay commented 1 year ago

why are the header changing? Will it change again?

kmartinez834 commented 1 year ago

Headers will not change again. We are now taking the final allbiomarkers-all.csv file from data.oncomx.org rather than a file that was manually prepared/edited.

rykahsay commented 1 year ago

Done --> please check unreviewed/human_protein_biomarkers.csv

kmartinez834 commented 1 year ago

@jeet-vora see #13 for comment about sample mapping

kmartinez834 commented 1 year ago

👍 Dataset created, moved issues to https://github.com/glygener/glygen-issues/issues/135 for next data release