COMETS 1.4. Correlations between metabolites with long names not currently working

steven-moore commented 6 years ago

After some revision on the processing of metabolite arrays, analyses the examine correlations of metabolites with one another fail when the names are long, have non-standard characters, etc.

unexpected input

steven-moore commented 6 years ago

The all metabolite*all metabolite analysis is now working is super batch mode. However, the interactive mode is yielding some odd errors. For example, there are redundant correlations (e.g. serine.1, serine.2, serine.3, and serine.4) if I run the model below (with screenshot):

Exposures: serine, glycine Outcomes: all metabolites

steven-moore commented 6 years ago

In addition, the metabolites with non-standard characters seem to be introducing multiple errors in interactive mode, per screenshot below. This may be a GUI interface issue related to recent changes in coding for metabolite names.

kailingchen commented 6 years ago

We will look into this.

steven-moore commented 6 years ago

Scrambled.CPSII.data (1).xlsx

ellatemprosa commented 6 years ago

my testing results from package side `> excorrdata <- COMETS::runCorr(exmodeldata,exmetabdata,"DPP") [1] "Removed serine because of zero-variance"
[2] "Near zero variance for these outcome(s) removed: pyridoxine (vitamin b6), hydroxycotinine, cotinine n-oxide, naproxen, desmethylnaproxen, desmethylnaproxen sulfate, ibuprofen, carboxyibuprofen glucuronide, ibuprofen acyl glucuronide, celecoxib, diclofenac, morphine, morphine-6-glucuronide, morphine-3-glucuronide, cephalexin, azithromycin, clindamycin, n-acetyl sulfapyridine, doxycycline, metronidazole, fluconazole, quinine, prednisolone, metoprolol, alpha-hydroxymetoprolol, atenolol, nifedipine, diltiazem, verapamil, ticlopidine, warfarin, triamterene, furosemide, candesartan, valsartan, enalapril, sildenafil, ranitidine, cimetidine, omeprazole, dexlansoprazole, pioglitazone, metformin, ketopioglitazone, hydroxypioglitazone (m-iv), rosiglitazone, atorvastatin (lipitor), atorvastatin lactone, gemfibrozil, allopurinol, allopurinol riboside, oxypurinol, probenecid, phenobarbital, carbamazepine, carbamazepine 10,11-epoxide, carbamazepine glucuronide, gabapentin, meprobamate, hydrox... [1] "Removed serine because of zero-variance"
[2] "Near zero variance for these outcome(s) removed: pyridoxine (vitamin b6), hydroxycotinine, cotinine n-oxide, naproxen, desmethylnaproxen, desmethylnaproxen sulfate, ibuprofen, carboxyibuprofen glucuronide, ibuprofen acyl glucuronide, celecoxib, diclofenac, morphine, morphine-6-glucuronide, morphine-3-glucuronide, cephalexin, azithromycin, clindamycin, n-acetyl sulfapyridine, doxycycline, metronidazole, fluconazole, quinine, prednisolone, metoprolol, alpha-hydroxymetoprolol, atenolol, nifedipine, diltiazem, verapamil, ticlopidine, warfarin, triamterene, furosemide, candesartan, valsartan, enalapril, sildenafil, ranitidine, cimetidine, omeprazole, dexlansoprazole, pioglitazone, metformin, ketopioglitazone, hydroxypioglitazone (m-iv), rosiglitazone, atorvastatin (lipitor), atorvastatin lactone, gemfibrozil, allopurinol, allopurinol riboside, oxypurinol, probenecid, phenobarbital, carbamazepine, carbamazepine 10,11-epoxide, carbamazepine glucuronide, gabapentin, meprobamate, hydrox... [1] "running unadjusted"

Error in data.frame(value, row.names = rn, check.names = FALSE, check.rows = FALSE) : row names supplied are of the wrong length `

steven-moore commented 6 years ago

On further examination of the correlation matrix from Super Batch mode, I am getting redundant correlations here as well

60142	AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_5alpha-androstan-3beta,17beta-diol disulfate	phenylacetylglutamate.1	-0.02583	556	0.541807
62524	AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_5alpha-androstan-3beta,17beta-diol disulfate	phenylacetylglutamate.2	-0.02583	556	0.541807
231646	AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_5alpha-androstan-3beta,17beta-diol disulfate	phenylacetylglutamate.3	-0.02583	556	0.541807
69670	AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_5alpha-androstan-3beta,17beta-diol disulfate	phenylacetylglutamine.1	-0.03231	556	0.445366
74434	AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_5alpha-androstan-3beta,17beta-diol disulfate	phenylacetylglutamine.2	-0.03231	556	0.445366
232837	AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_5alpha-androstan-3beta,17beta-diol disulfate	phenylacetylglutamine.3	-0.03231	556	0.445366
12502	AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_5alpha-androstan-3beta,17beta-diol disulfate	phenylacetylglycine.1	-0.10127	556	0.016515
235219	AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_5alpha-androstan-3beta,17beta-diol disulfate	phenylacetylglycine.2	-0.10127	556	0.016515
35131	AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_5alpha-androstan-3beta,17beta-diol disulfate	phenylalanine.1	-0.10704	556	0.011255

steven-moore commented 6 years ago

At 10:13 AM on 5/24, I ran the core sample file, and received output for all metabolites*all metabolites that was perfectly aligned with expectations. In the exposurespec, there are no metabolites with a ".2" suffix.

cohort	spec	model	outcomespec	exposurespec	corr	n	pvalue	adjspec	adjvars	outcome_uid	outcome	exposure_uid	exposure
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_2_3_benzenetriol_sulfate_2.1	1	1000	0	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_2_3_benzenetriol_sulfate_2.1	_1_2_3_benzenetriol_sulfate_2.1
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_2_dipalmitoylglycerol.1	0.069164	1000	0.028421	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_2_dipalmitoylglycerol.1	_1_2_dipalmitoylglycerol.1
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_2_propanediol.1	0.070134	1000	0.026269	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_2_propanediol.1	_1_2_propanediol.1
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_3_7_trimethylurate.1	0.409211	1000	8.43E-42	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_3_7_trimethylurate.1	_1_3_7_trimethylurate.1
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_3_dimethylurate.1	0.229116	1000	2.01E-13	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_3_dimethylurate.1	_1_3_dimethylurate.1
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_3_dipalmitoylglycerol.1	0.107146	1000	0.000672	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_3_dipalmitoylglycerol.1	_1_3_dipalmitoylglycerol.1
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_5_anhydroglucitol__1_5ag.1	-0.02368	1000	0.453494	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_5_anhydroglucitol__1_5ag.1	_1_5_anhydroglucitol__1_5ag.1
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_6_anhydroglucose.1	0.142148	1000	6.14E-06	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_6_anhydroglucose.1	_1_6_anhydroglucose.1
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_7_dimethylurate.1	0.329426	1000	7.79E-27	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_7_dimethylurate.1	_1_7_dimethylurate.1
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_arachidonoylglycerophoscho.1	0.036403	1000	0.249149	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_arachidonoylglycerophoscho.1	_1_arachidonoylglycerophoscho.1
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_arachidonoylglycerophoseth.1	0.076321	1000	0.015571	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_arachidonoylglycerophoseth.1	_1_arachidonoylglycerophoseth.1
AIRWAVE	Batch	Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	_1_arachidonoylglycerophosino.1	-0.04659	1000	0.140126	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	_1_arachidonoylglycerophosino.1	_1_arachidonoylglycerophosino.1

steven-moore commented 6 years ago

At 11:05 AM on 5/29, the redundancies are now present (see the ".2" suffixes in "exposurespec"). So, code changes between 5/24 and 5/29 introduced this issue.

Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	n6_methyladenosine.1	0.081044	1000	0.0102	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	n6_methyladenosine.1	n6_methyladenosine.1
Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	n6_succinyladenosine.1	0.059475	1000	0.059587	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	n6_succinyladenosine.1	n6_succinyladenosine.1
Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	n1_methyladenosine.2	-0.02974	1000	0.346463	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	n1_methyladenosine.2	n1_methyladenosine.2
Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	n6_carbamoylthreonyladenosine.2	0.023488	1000	0.457232	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	n6_carbamoylthreonyladenosine.2	n6_carbamoylthreonyladenosine.2
Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	n6_methyladenosine.2	0.081044	1000	0.0102	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	n6_methyladenosine.2	n6_methyladenosine.2
Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	n6_succinyladenosine.2	0.059475	1000	0.059587	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	n6_succinyladenosine.2	n6_succinyladenosine.2
Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	camp.1	0.124343	1000	7.81E-05	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	camp.1	camp.1
Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	campesterol.1	0.124013	1000	8.16E-05	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	campesterol.1	campesterol.1
Age.3.0 All pairwise metabolites	_1_2_3_benzenetriol_sulfate_2	campesterol.2	0.124013	1000	8.16E-05	None	None	CHEM100006374	1,2,3-benzenetriol sulfate (2)	campesterol.2	campesterol.2

steven-moore commented 6 years ago

Confirmed that this issue still persists as of 10:30 AM 6/1

ellatemprosa commented 6 years ago

per package run, we have the model formula specified as

steven-moore commented 6 years ago

Does it work correctly if non-metabolites are specified? e.g. Age and BMI both as exposures?

park-brian commented 6 years ago

Hi Steve,

It looks like if I specify both age and bmi as exposures for the input file, I get the following output:

exmetabdata <- COMETS::readCOMETSinput('Scrambled.CPSII.data (1).xlsx')
exmodeldata <- COMETS::getModelData(exmetabdata, modelspec = 'Interactive', colvars=c('age', 'bmi'), rowvars=c('hydroxyasparagine**'))

NULL
[1] "running unadjusted"
Error in `[.data.frame`(ttval, , i) : undefined columns selected

However, it works if I only specify one variable as an exposure:

exmetabdata <- COMETS::readCOMETSinput('Scrambled.CPSII.data (1).xlsx')
exmodeldata <- COMETS::getModelData(exmetabdata, modelspec = 'Interactive', colvars=c('age'), rowvars=c('hydroxyasparagine**'))
COMETS::showCorr(excorrdata, 1)

   cohort        spec model         outcomespec exposurespec        corr   n    pvalue adjspec adjvars         outcome_uid             outcome exposure_uid     exposure
1 AIRWAVE Interactive       hydroxyasparagine**          age -0.05766825 556 0.1729606    None    None hydroxyasparagine** hydroxyasparagine**          age Age at Entry

To reproduce the second screenshot, I ran the following code (output has been converted to xlsx from csv): correlation_outputairwave2018-06-01.xlsx

exmetabdata <- COMETS::readCOMETSinput("Scrambled.CPSII.data (1).xlsx")
exmodeldata <- COMETS::getModelData(exmetabdata,modelspec="Interactive",colvars=c("serine","glycine"))
excorrdata  <- COMETS::runCorr(exmodeldata,exmetabdata,"AIRWAVE")
COMETS::OutputCSVResults(filename="correlation_output", dataf=excorrdata, cohort="AIRWAVE")

The third screenshot uses the following code:

exmetabdata <- COMETS::readCOMETSinput("Scrambled.CPSII.data (1).xlsx")
exmodeldata <- COMETS::getModelData(exmetabdata, modelspec = 'Interactive', colvars=c('hydroxyasparagine**'), rowvars=c('n-acetylaspartate (naa)'))
excorrdata <- COMETS::runCorr(exmodeldata, exmetabdata, 'AIRWAVE')

and produces the output:

NULL
[1] "running unadjusted"
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr,  : 
  length of 'dimnames' [2] not equal to array exten

ellatemprosa commented 6 years ago

ok, i fixed the code to check whether any factors are in the model, previously, the check is only for all metabolite vs all metabolite. in your example, you had glycine and serine as exposure so it proceeded to do the dummy var. it produced results now that seem correct but somewhere along the way, the .1 suffix is appended. i will track this down next and also discovered that we should really make each exposure as individual models

steven-moore commented 6 years ago

Ewy proposes that we handle by tracking metabolites based on array indices rather than names and merges. Ewy will discuss with Ella to make sure that we have consensus on this approach. She also noted that this will break things initially, and will take some time to implement.

steven-moore commented 6 years ago

Bam. Solved. Issue closed. Thanks all for the hard work on this issue.

CBIIT / R-cometsAnalytics

COMETS 1.4. Correlations between metabolites with long names not currently working #58