Closed steven-moore closed 6 years ago
The all metabolite*all metabolite analysis is now working is super batch mode. However, the interactive mode is yielding some odd errors. For example, there are redundant correlations (e.g. serine.1, serine.2, serine.3, and serine.4) if I run the model below (with screenshot):
Exposures: serine, glycine Outcomes: all metabolites
In addition, the metabolites with non-standard characters seem to be introducing multiple errors in interactive mode, per screenshot below. This may be a GUI interface issue related to recent changes in coding for metabolite names.
We will look into this.
my testing results from package side
`> excorrdata <- COMETS::runCorr(exmodeldata,exmetabdata,"DPP")
[1] "Removed serine because of zero-variance"
[2] "Near zero variance for these outcome(s) removed: pyridoxine (vitamin b6), hydroxycotinine, cotinine n-oxide, naproxen, desmethylnaproxen, desmethylnaproxen sulfate, ibuprofen, carboxyibuprofen glucuronide, ibuprofen acyl glucuronide, celecoxib, diclofenac, morphine, morphine-6-glucuronide, morphine-3-glucuronide, cephalexin, azithromycin, clindamycin, n-acetyl sulfapyridine, doxycycline, metronidazole, fluconazole, quinine, prednisolone, metoprolol, alpha-hydroxymetoprolol, atenolol, nifedipine, diltiazem, verapamil, ticlopidine, warfarin, triamterene, furosemide, candesartan, valsartan, enalapril, sildenafil, ranitidine, cimetidine, omeprazole, dexlansoprazole, pioglitazone, metformin, ketopioglitazone, hydroxypioglitazone (m-iv), rosiglitazone, atorvastatin (lipitor), atorvastatin lactone, gemfibrozil, allopurinol, allopurinol riboside, oxypurinol, probenecid, phenobarbital, carbamazepine, carbamazepine 10,11-epoxide, carbamazepine glucuronide, gabapentin, meprobamate, hydrox...
[2] "Near zero variance for these outcome(s) removed: pyridoxine (vitamin b6), hydroxycotinine, cotinine n-oxide, naproxen, desmethylnaproxen, desmethylnaproxen sulfate, ibuprofen, carboxyibuprofen glucuronide
Error in data.frame(value, row.names = rn, check.names = FALSE, check.rows = FALSE) : row names supplied are of the wrong length `
On further examination of the correlation matrix from Super Batch mode, I am getting redundant correlations here as well
60142 | AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _5alpha-androstan-3beta,17beta-diol disulfate | phenylacetylglutamate.1 | -0.02583 | 556 | 0.541807 |
---|---|---|---|---|---|---|---|---|
62524 | AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _5alpha-androstan-3beta,17beta-diol disulfate | phenylacetylglutamate.2 | -0.02583 | 556 | 0.541807 |
231646 | AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _5alpha-androstan-3beta,17beta-diol disulfate | phenylacetylglutamate.3 | -0.02583 | 556 | 0.541807 |
69670 | AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _5alpha-androstan-3beta,17beta-diol disulfate | phenylacetylglutamine.1 | -0.03231 | 556 | 0.445366 |
74434 | AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _5alpha-androstan-3beta,17beta-diol disulfate | phenylacetylglutamine.2 | -0.03231 | 556 | 0.445366 |
232837 | AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _5alpha-androstan-3beta,17beta-diol disulfate | phenylacetylglutamine.3 | -0.03231 | 556 | 0.445366 |
12502 | AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _5alpha-androstan-3beta,17beta-diol disulfate | phenylacetylglycine.1 | -0.10127 | 556 | 0.016515 |
235219 | AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _5alpha-androstan-3beta,17beta-diol disulfate | phenylacetylglycine.2 | -0.10127 | 556 | 0.016515 |
35131 | AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _5alpha-androstan-3beta,17beta-diol disulfate | phenylalanine.1 | -0.10704 | 556 | 0.011255 |
At 10:13 AM on 5/24, I ran the core sample file, and received output for all metabolites*all metabolites that was perfectly aligned with expectations. In the exposurespec, there are no metabolites with a ".2" suffix.
cohort | spec | model | outcomespec | exposurespec | corr | n | pvalue | adjspec | adjvars | outcome_uid | outcome | exposure_uid | exposure |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_2_3_benzenetriol_sulfate_2.1 | 1 | 1000 | 0 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_2_3_benzenetriol_sulfate_2.1 | _1_2_3_benzenetriol_sulfate_2.1 |
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_2_dipalmitoylglycerol.1 | 0.069164 | 1000 | 0.028421 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_2_dipalmitoylglycerol.1 | _1_2_dipalmitoylglycerol.1 |
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_2_propanediol.1 | 0.070134 | 1000 | 0.026269 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_2_propanediol.1 | _1_2_propanediol.1 |
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_3_7_trimethylurate.1 | 0.409211 | 1000 | 8.43E-42 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_3_7_trimethylurate.1 | _1_3_7_trimethylurate.1 |
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_3_dimethylurate.1 | 0.229116 | 1000 | 2.01E-13 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_3_dimethylurate.1 | _1_3_dimethylurate.1 |
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_3_dipalmitoylglycerol.1 | 0.107146 | 1000 | 0.000672 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_3_dipalmitoylglycerol.1 | _1_3_dipalmitoylglycerol.1 |
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_5_anhydroglucitol__1_5ag.1 | -0.02368 | 1000 | 0.453494 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_5_anhydroglucitol__1_5ag.1 | _1_5_anhydroglucitol__1_5ag.1 |
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_6_anhydroglucose.1 | 0.142148 | 1000 | 6.14E-06 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_6_anhydroglucose.1 | _1_6_anhydroglucose.1 |
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_7_dimethylurate.1 | 0.329426 | 1000 | 7.79E-27 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_7_dimethylurate.1 | _1_7_dimethylurate.1 |
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_arachidonoylglycerophoscho.1 | 0.036403 | 1000 | 0.249149 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_arachidonoylglycerophoscho.1 | _1_arachidonoylglycerophoscho.1 |
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_arachidonoylglycerophoseth.1 | 0.076321 | 1000 | 0.015571 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_arachidonoylglycerophoseth.1 | _1_arachidonoylglycerophoseth.1 |
AIRWAVE | Batch | Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | _1_arachidonoylglycerophosino.1 | -0.04659 | 1000 | 0.140126 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | _1_arachidonoylglycerophosino.1 | _1_arachidonoylglycerophosino.1 |
At 11:05 AM on 5/29, the redundancies are now present (see the ".2" suffixes in "exposurespec"). So, code changes between 5/24 and 5/29 introduced this issue.
Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | n6_methyladenosine.1 | 0.081044 | 1000 | 0.0102 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | n6_methyladenosine.1 | n6_methyladenosine.1 |
---|---|---|---|---|---|---|---|---|---|---|---|
Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | n6_succinyladenosine.1 | 0.059475 | 1000 | 0.059587 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | n6_succinyladenosine.1 | n6_succinyladenosine.1 |
Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | n1_methyladenosine.2 | -0.02974 | 1000 | 0.346463 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | n1_methyladenosine.2 | n1_methyladenosine.2 |
Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | n6_carbamoylthreonyladenosine.2 | 0.023488 | 1000 | 0.457232 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | n6_carbamoylthreonyladenosine.2 | n6_carbamoylthreonyladenosine.2 |
Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | n6_methyladenosine.2 | 0.081044 | 1000 | 0.0102 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | n6_methyladenosine.2 | n6_methyladenosine.2 |
Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | n6_succinyladenosine.2 | 0.059475 | 1000 | 0.059587 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | n6_succinyladenosine.2 | n6_succinyladenosine.2 |
Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | camp.1 | 0.124343 | 1000 | 7.81E-05 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | camp.1 | camp.1 |
Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | campesterol.1 | 0.124013 | 1000 | 8.16E-05 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | campesterol.1 | campesterol.1 |
Age.3.0 All pairwise metabolites | _1_2_3_benzenetriol_sulfate_2 | campesterol.2 | 0.124013 | 1000 | 8.16E-05 | None | None | CHEM100006374 | 1,2,3-benzenetriol sulfate (2) | campesterol.2 | campesterol.2 |
Confirmed that this issue still persists as of 10:30 AM 6/1
per package run, we have the model formula specified as
Does it work correctly if non-metabolites are specified? e.g. Age and BMI both as exposures?
Hi Steve,
It looks like if I specify both age and bmi as exposures for the input file, I get the following output:
exmetabdata <- COMETS::readCOMETSinput('Scrambled.CPSII.data (1).xlsx')
exmodeldata <- COMETS::getModelData(exmetabdata, modelspec = 'Interactive', colvars=c('age', 'bmi'), rowvars=c('hydroxyasparagine**'))
NULL
[1] "running unadjusted"
Error in `[.data.frame`(ttval, , i) : undefined columns selected
However, it works if I only specify one variable as an exposure:
exmetabdata <- COMETS::readCOMETSinput('Scrambled.CPSII.data (1).xlsx')
exmodeldata <- COMETS::getModelData(exmetabdata, modelspec = 'Interactive', colvars=c('age'), rowvars=c('hydroxyasparagine**'))
COMETS::showCorr(excorrdata, 1)
cohort spec model outcomespec exposurespec corr n pvalue adjspec adjvars outcome_uid outcome exposure_uid exposure
1 AIRWAVE Interactive hydroxyasparagine** age -0.05766825 556 0.1729606 None None hydroxyasparagine** hydroxyasparagine** age Age at Entry
To reproduce the second screenshot, I ran the following code (output has been converted to xlsx from csv): correlation_outputairwave2018-06-01.xlsx
exmetabdata <- COMETS::readCOMETSinput("Scrambled.CPSII.data (1).xlsx")
exmodeldata <- COMETS::getModelData(exmetabdata,modelspec="Interactive",colvars=c("serine","glycine"))
excorrdata <- COMETS::runCorr(exmodeldata,exmetabdata,"AIRWAVE")
COMETS::OutputCSVResults(filename="correlation_output", dataf=excorrdata, cohort="AIRWAVE")
The third screenshot uses the following code:
exmetabdata <- COMETS::readCOMETSinput("Scrambled.CPSII.data (1).xlsx")
exmodeldata <- COMETS::getModelData(exmetabdata, modelspec = 'Interactive', colvars=c('hydroxyasparagine**'), rowvars=c('n-acetylaspartate (naa)'))
excorrdata <- COMETS::runCorr(exmodeldata, exmetabdata, 'AIRWAVE')
and produces the output:
NULL
[1] "running unadjusted"
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr, :
length of 'dimnames' [2] not equal to array exten
ok, i fixed the code to check whether any factors are in the model, previously, the check is only for all metabolite vs all metabolite. in your example, you had glycine and serine as exposure so it proceeded to do the dummy var. it produced results now that seem correct but somewhere along the way, the .1 suffix is appended. i will track this down next and also discovered that we should really make each exposure as individual models
Ewy proposes that we handle by tracking metabolites based on array indices rather than names and merges. Ewy will discuss with Ella to make sure that we have consensus on this approach. She also noted that this will break things initially, and will take some time to implement.
Bam. Solved. Issue closed. Thanks all for the hard work on this issue.
After some revision on the processing of metabolite arrays, analyses the examine correlations of metabolites with one another fail when the names are long, have non-standard characters, etc.