EBISPOT / goci

GWAS Catalog Ontology and Curation Infrastructure
Apache License 2.0
26 stars 19 forks source link

Do a complete sanity check of duplicates in DISEASE_TRAIT #220

Closed tudorgroza closed 4 years ago

tudorgroza commented 4 years ago

Background: We have now encountered at least 3 examples of duplicate entries in the DISEASE_TRAIT table that has led to the import process failing. Goal: do a full sanity check of the DISEASE_TRAIT table to see how many other duplicates we find

tudorgroza commented 4 years ago

@ljwh2 @sprintell @jdhayhurst

Here are the results of the analysis:

Aggressive periodontitis (sex interaction) :: ['10071459', '21596974']
Carotid artery intima media thickness (sex interaction) :: ['12541973', '21596980']
Asthma (sex interaction) :: ['4325', '21596987']
Body mass index (sex interaction) :: ['4176', '24161567']
Waist circumference (sex interaction) :: ['4178', '24161568']
Bone mineral density (sex interaction) :: ['21411562', '21596967']
CD40 levels :: ['62065885', '62066139'] => due to space at the end
CXCL6 levels :: ['62066092', '62066149'] => due to space at the end
CD45 on lymphocyte :: ['62066788', '62066915']
CCR2 on monocyte :: ['62066882', '62066891']
tudorgroza commented 4 years ago
1243::Immunoglobulin A '
1605::Asthma (aspirin-intolerant) '
4074::Joint damage progression in ACPA-positive rheumatoid arthritis '
4066::IgE levels in asthmatics (D.p. specific) '
4067::IgE levels in asthmatics (D.f. specific) '
4305::Response to rate control therapy in atrial fibrillation '
3668::Antineutrophil cytoplasmic antibody-associated vasculitis '
3669::Resistin levels '
3704::Airflow obstruction '
2881::Renal sinus fat '
3024::Testicular dysgenesis syndrome '
3301::Duodenal ulcer '
1461::Diabetic retinopathy '
1484::Tardive dyskinesia '
3741::Lentiform nucleus volume '
3766::Breast cancer (male) '
3778::Myasthenia gravis '
3774::Drug-induced liver injury '
2583::Epirubicin-induced leukopenia '
840::Basal cell carcinoma '
2123::Vaccine-related adverse events '
2482::Postoperative nausea and vomiting '
2561::Interstitial lung disease '
2562::Meningioma '
2823::Dengue shock syndrome '
2821::Postoperative ventricular dysfunction '
3061::Ankle-brachial index '
3063::Response to gemcitabine in pancreatic cancer '
3201::Treatment response for severe sepsis '
3303::Naphthyl-keratin adduct levels '
3304::Arsenic metabolism '
3366::IgE levels '
3441::Cystic fibrosis (meconium ileus) '
3503::Antihypertensive response '
3507::Non-albumin protein levels '
3522::HIV-associated dementia '
3543::Immune response to anthrax vaccine '
3839::Paraoxonase activity '
3857::Circulating vasoactive peptide levels '
3881::Eating disorders (purging via substances) '
3891::Pit-and-Fissure caries '
3892::Smooth-surface caries '
3918::Ovarian cancer in BRCA1 mutation carriers '
3936::Circulating myeloperoxidase levels (serum) '
10066317::Congenital left-sided heart lesions '
15017527::Joint damage progression in ACPA-negative rheumatoid arthritis '
17925127::Exhaled nitric oxide output '
20508869::HIV-associated neurocognitive disorder (mild neurocognitive disorder or asymptomatic neurocognitive impairment) '
21481322::Plasma androstenedione levels in resected early stage-receptor positive breast cancer '
21481354::Estrone/androstenedione ratio in resected early stage-receptor positive breast cancer '
21610683::Fear of severe pain '
21610684::Fear of minor pain '
14940648::Plasma clusterin levels '
19733280::Nevirapine-induced hypersensitivity in HIV (hypersensitivity syndrome) '
21481043::Plasma estrone conjugates levels in resected early stage estrogen-receptor positive breast cancer '
14924603::Cough in response to angiotensin-converting enzyme inhibitor drugs '
15811530::Response to Pazopanib in cancer (hepatotoxicity) '
24084891::Narcolepsy with cataplexy (HLA-DQ0602 positive) or hypocretin-1 deficiency '
27241924::Bone mineral density change response to combined chemotherapy in acute lymphoblastic leukemia '
33473910::Thiopurine-induced severe leukopenia in inflammatory bowel disease '
47679642::Lung function in heavy smokers (low FEV1 vs high FEV1) '
47679643::Lung function in never smokers (low FEV1 vs high FEV1) '
47679645::Lung function in heavy smokers (high FEV1 vs average FEV1) '
47679646::Lung function in heavy smokers (low FEV1 vs average FEV1) '
55291701::NASH resolution in nonalcoholic steatohepatitis '
62066127::4E-BP1 levels '
62066128::Adenosine Deaminase levels '
62066129::Axin-1 levels '
62066130::Caspase 8 levels '
62066132::CCL20 levels '
62066133::CCL23 levels '
62066134::CCL25 levels '
62066135::CCL28 levels '
62066136::CCL3 levels '
62066137::CCL4 levels '
62066138::CD244 levels '
62066139::CD40 levels '
62066140::CD5 levels '
62066141::CD6 levels '
62066142::CDCP1 levels '
62066143::Colony stimulating factor levels '
62066144::CST5 levels '
62066145::CX3CL1 levels '
62066146::CXCL1 levels '
62066147::CXCL10 levels '
62066148::CXCL11 levels '
62066149::CXCL6 levels '
62066150::CXCL9 levels '
62066151::DNER levels '
62066152::EN-RAGE levels '
62066153::Fibroblast growth factor 21 levels '
62066154::Fibroblast growth factor 23 levels '
62066155::Fibroblast growth factor 5 levels '
62066156::Fibroblast growth factor 19 levels '
62066157::Flt3L levels '
62066161::Transforming growth factor-beta levels '
62066162::Leukemia inhibitory factor receptor levels '
62066166::Neurotrophin-3 levels '
62066167::Oncostatin-M levels '
62066168::PD-L1 levels '
62066169::Sirtuin-2 levels '
62066170::SLAMF1 levels '
62066171::Sulfotrasferase 1A1 levels '
62066172::STAM binding protein levels '
62066173::Transforming growth factor-alpha levels '
62066174::Tumor necrosis factor ligand superfamily member 14 levels '
62066175::Tumor necrosis factor ligand superfamily member 11 levels '
62066176::Tumor necrosis factor ligand superfamily member 12 levels '
62066177::Tumor necrosis factor receptor superfamily member 9 levels '

/cc @ljwh2 @sprintell @jdhayhurst