CBIIT / R-cometsAnalytics

R package development for COMETS Analytics
12 stars 10 forks source link

COMETS 1.4. Correlations between metabolites with long names not currently working #58

Closed steven-moore closed 6 years ago

steven-moore commented 6 years ago

After some revision on the processing of metabolite arrays, analyses the examine correlations of metabolites with one another fail when the names are long, have non-standard characters, etc.

unexpected input

steven-moore commented 6 years ago

The all metabolite*all metabolite analysis is now working is super batch mode. However, the interactive mode is yielding some odd errors. For example, there are redundant correlations (e.g. serine.1, serine.2, serine.3, and serine.4) if I run the model below (with screenshot):

Exposures: serine, glycine Outcomes: all metabolites

image

steven-moore commented 6 years ago

In addition, the metabolites with non-standard characters seem to be introducing multiple errors in interactive mode, per screenshot below. This may be a GUI interface issue related to recent changes in coding for metabolite names.

image

kailingchen commented 6 years ago

We will look into this.

steven-moore commented 6 years ago

Scrambled.CPSII.data (1).xlsx

ellatemprosa commented 6 years ago

my testing results from package side `> excorrdata <- COMETS::runCorr(exmodeldata,exmetabdata,"DPP") [1] "Removed serine because of zero-variance"
[2] "Near zero variance for these outcome(s) removed: pyridoxine (vitamin b6), hydroxycotinine, cotinine n-oxide, naproxen, desmethylnaproxen, desmethylnaproxen sulfate, ibuprofen, carboxyibuprofen glucuronide, ibuprofen acyl glucuronide, celecoxib, diclofenac, morphine, morphine-6-glucuronide, morphine-3-glucuronide, cephalexin, azithromycin, clindamycin, n-acetyl sulfapyridine, doxycycline, metronidazole, fluconazole, quinine, prednisolone, metoprolol, alpha-hydroxymetoprolol, atenolol, nifedipine, diltiazem, verapamil, ticlopidine, warfarin, triamterene, furosemide, candesartan, valsartan, enalapril, sildenafil, ranitidine, cimetidine, omeprazole, dexlansoprazole, pioglitazone, metformin, ketopioglitazone, hydroxypioglitazone (m-iv), rosiglitazone, atorvastatin (lipitor), atorvastatin lactone, gemfibrozil, allopurinol, allopurinol riboside, oxypurinol, probenecid, phenobarbital, carbamazepine, carbamazepine 10,11-epoxide, carbamazepine glucuronide, gabapentin, meprobamate, hydrox... [1] "Removed serine because of zero-variance"
[2] "Near zero variance for these outcome(s) removed: pyridoxine (vitamin b6), hydroxycotinine, cotinine n-oxide, naproxen, desmethylnaproxen, desmethylnaproxen sulfate, ibuprofen, carboxyibuprofen glucuronide
, ibuprofen acyl glucuronide, celecoxib, diclofenac, morphine, morphine-6-glucuronide, morphine-3-glucuronide, cephalexin, azithromycin, clindamycin, n-acetyl sulfapyridine, doxycycline, metronidazole, fluconazole, quinine, prednisolone, metoprolol, alpha-hydroxymetoprolol, atenolol, nifedipine, diltiazem, verapamil, ticlopidine, warfarin, triamterene, furosemide, candesartan, valsartan, enalapril, sildenafil, ranitidine, cimetidine, omeprazole, dexlansoprazole, pioglitazone, metformin, ketopioglitazone, hydroxypioglitazone (m-iv), rosiglitazone, atorvastatin (lipitor), atorvastatin lactone, gemfibrozil, allopurinol, allopurinol riboside, oxypurinol, probenecid, phenobarbital, carbamazepine, carbamazepine 10,11-epoxide, carbamazepine glucuronide, gabapentin, meprobamate, hydrox... [1] "running unadjusted"

Error in data.frame(value, row.names = rn, check.names = FALSE, check.rows = FALSE) : row names supplied are of the wrong length `

steven-moore commented 6 years ago

On further examination of the correlation matrix from Super Batch mode, I am getting redundant correlations here as well

60142 AIRWAVE Batch Age.3.0 All pairwise metabolites _5alpha-androstan-3beta,17beta-diol disulfate phenylacetylglutamate.1 -0.02583 556 0.541807
62524 AIRWAVE Batch Age.3.0 All pairwise metabolites _5alpha-androstan-3beta,17beta-diol disulfate phenylacetylglutamate.2 -0.02583 556 0.541807
231646 AIRWAVE Batch Age.3.0 All pairwise metabolites _5alpha-androstan-3beta,17beta-diol disulfate phenylacetylglutamate.3 -0.02583 556 0.541807
69670 AIRWAVE Batch Age.3.0 All pairwise metabolites _5alpha-androstan-3beta,17beta-diol disulfate phenylacetylglutamine.1 -0.03231 556 0.445366
74434 AIRWAVE Batch Age.3.0 All pairwise metabolites _5alpha-androstan-3beta,17beta-diol disulfate phenylacetylglutamine.2 -0.03231 556 0.445366
232837 AIRWAVE Batch Age.3.0 All pairwise metabolites _5alpha-androstan-3beta,17beta-diol disulfate phenylacetylglutamine.3 -0.03231 556 0.445366
12502 AIRWAVE Batch Age.3.0 All pairwise metabolites _5alpha-androstan-3beta,17beta-diol disulfate phenylacetylglycine.1 -0.10127 556 0.016515
235219 AIRWAVE Batch Age.3.0 All pairwise metabolites _5alpha-androstan-3beta,17beta-diol disulfate phenylacetylglycine.2 -0.10127 556 0.016515
35131 AIRWAVE Batch Age.3.0 All pairwise metabolites _5alpha-androstan-3beta,17beta-diol disulfate phenylalanine.1 -0.10704 556 0.011255
steven-moore commented 6 years ago

At 10:13 AM on 5/24, I ran the core sample file, and received output for all metabolites*all metabolites that was perfectly aligned with expectations. In the exposurespec, there are no metabolites with a ".2" suffix.

cohort spec model outcomespec exposurespec corr n pvalue adjspec adjvars outcome_uid outcome exposure_uid exposure
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_2_3_benzenetriol_sulfate_2.1 1 1000 0 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_2_3_benzenetriol_sulfate_2.1 _1_2_3_benzenetriol_sulfate_2.1
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_2_dipalmitoylglycerol.1 0.069164 1000 0.028421 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_2_dipalmitoylglycerol.1 _1_2_dipalmitoylglycerol.1
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_2_propanediol.1 0.070134 1000 0.026269 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_2_propanediol.1 _1_2_propanediol.1
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_3_7_trimethylurate.1 0.409211 1000 8.43E-42 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_3_7_trimethylurate.1 _1_3_7_trimethylurate.1
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_3_dimethylurate.1 0.229116 1000 2.01E-13 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_3_dimethylurate.1 _1_3_dimethylurate.1
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_3_dipalmitoylglycerol.1 0.107146 1000 0.000672 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_3_dipalmitoylglycerol.1 _1_3_dipalmitoylglycerol.1
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_5_anhydroglucitol__1_5ag.1 -0.02368 1000 0.453494 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_5_anhydroglucitol__1_5ag.1 _1_5_anhydroglucitol__1_5ag.1
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_6_anhydroglucose.1 0.142148 1000 6.14E-06 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_6_anhydroglucose.1 _1_6_anhydroglucose.1
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_7_dimethylurate.1 0.329426 1000 7.79E-27 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_7_dimethylurate.1 _1_7_dimethylurate.1
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_arachidonoylglycerophoscho.1 0.036403 1000 0.249149 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_arachidonoylglycerophoscho.1 _1_arachidonoylglycerophoscho.1
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_arachidonoylglycerophoseth.1 0.076321 1000 0.015571 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_arachidonoylglycerophoseth.1 _1_arachidonoylglycerophoseth.1
AIRWAVE Batch Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 _1_arachidonoylglycerophosino.1 -0.04659 1000 0.140126 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) _1_arachidonoylglycerophosino.1 _1_arachidonoylglycerophosino.1
steven-moore commented 6 years ago

At 11:05 AM on 5/29, the redundancies are now present (see the ".2" suffixes in "exposurespec"). So, code changes between 5/24 and 5/29 introduced this issue.

Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 n6_methyladenosine.1 0.081044 1000 0.0102 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) n6_methyladenosine.1 n6_methyladenosine.1
Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 n6_succinyladenosine.1 0.059475 1000 0.059587 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) n6_succinyladenosine.1 n6_succinyladenosine.1
Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 n1_methyladenosine.2 -0.02974 1000 0.346463 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) n1_methyladenosine.2 n1_methyladenosine.2
Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 n6_carbamoylthreonyladenosine.2 0.023488 1000 0.457232 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) n6_carbamoylthreonyladenosine.2 n6_carbamoylthreonyladenosine.2
Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 n6_methyladenosine.2 0.081044 1000 0.0102 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) n6_methyladenosine.2 n6_methyladenosine.2
Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 n6_succinyladenosine.2 0.059475 1000 0.059587 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) n6_succinyladenosine.2 n6_succinyladenosine.2
Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 camp.1 0.124343 1000 7.81E-05 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) camp.1 camp.1
Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 campesterol.1 0.124013 1000 8.16E-05 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) campesterol.1 campesterol.1
Age.3.0 All pairwise metabolites _1_2_3_benzenetriol_sulfate_2 campesterol.2 0.124013 1000 8.16E-05 None None CHEM100006374 1,2,3-benzenetriol sulfate (2) campesterol.2 campesterol.2
steven-moore commented 6 years ago

Confirmed that this issue still persists as of 10:30 AM 6/1

ellatemprosa commented 6 years ago

per package run, we have the model formula specified as image

steven-moore commented 6 years ago

Does it work correctly if non-metabolites are specified? e.g. Age and BMI both as exposures?

park-brian commented 6 years ago

Hi Steve,

It looks like if I specify both age and bmi as exposures for the input file, I get the following output:

exmetabdata <- COMETS::readCOMETSinput('Scrambled.CPSII.data (1).xlsx')
exmodeldata <- COMETS::getModelData(exmetabdata, modelspec = 'Interactive', colvars=c('age', 'bmi'), rowvars=c('hydroxyasparagine**'))
NULL
[1] "running unadjusted"
Error in `[.data.frame`(ttval, , i) : undefined columns selected

However, it works if I only specify one variable as an exposure:

exmetabdata <- COMETS::readCOMETSinput('Scrambled.CPSII.data (1).xlsx')
exmodeldata <- COMETS::getModelData(exmetabdata, modelspec = 'Interactive', colvars=c('age'), rowvars=c('hydroxyasparagine**'))
COMETS::showCorr(excorrdata, 1)
   cohort        spec model         outcomespec exposurespec        corr   n    pvalue adjspec adjvars         outcome_uid             outcome exposure_uid     exposure
1 AIRWAVE Interactive       hydroxyasparagine**          age -0.05766825 556 0.1729606    None    None hydroxyasparagine** hydroxyasparagine**          age Age at Entry

To reproduce the second screenshot, I ran the following code (output has been converted to xlsx from csv): correlation_outputairwave2018-06-01.xlsx

exmetabdata <- COMETS::readCOMETSinput("Scrambled.CPSII.data (1).xlsx")
exmodeldata <- COMETS::getModelData(exmetabdata,modelspec="Interactive",colvars=c("serine","glycine"))
excorrdata  <- COMETS::runCorr(exmodeldata,exmetabdata,"AIRWAVE")
COMETS::OutputCSVResults(filename="correlation_output", dataf=excorrdata, cohort="AIRWAVE")

The third screenshot uses the following code:

exmetabdata <- COMETS::readCOMETSinput("Scrambled.CPSII.data (1).xlsx")
exmodeldata <- COMETS::getModelData(exmetabdata, modelspec = 'Interactive', colvars=c('hydroxyasparagine**'), rowvars=c('n-acetylaspartate (naa)'))
excorrdata <- COMETS::runCorr(exmodeldata, exmetabdata, 'AIRWAVE')

and produces the output:

NULL
[1] "running unadjusted"
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr,  : 
  length of 'dimnames' [2] not equal to array exten
ellatemprosa commented 6 years ago

ok, i fixed the code to check whether any factors are in the model, previously, the check is only for all metabolite vs all metabolite. in your example, you had glycine and serine as exposure so it proceeded to do the dummy var. it produced results now that seem correct but somewhere along the way, the .1 suffix is appended. i will track this down next and also discovered that we should really make each exposure as individual models image

steven-moore commented 6 years ago

Ewy proposes that we handle by tracking metabolites based on array indices rather than names and merges. Ewy will discuss with Ella to make sure that we have consensus on this approach. She also noted that this will break things initially, and will take some time to implement.

steven-moore commented 6 years ago

Bam. Solved. Issue closed. Thanks all for the hard work on this issue.