CBIIT / R-cometsAnalytics

R package development for COMETS Analytics
12 stars 10 forks source link

COMETS 1.3. Problems with some adjustment-strata combos #48

Closed steven-moore closed 6 years ago

steven-moore commented 6 years ago

Certain combinations of adjustments and stratification can still cause problems with models. One of the simplest scenarios uses the data below, with the model as follows:

Exposure: age Outcome: glycine (can also use All Metabolites) Adjusted: bmi_grp, alc_grp Strata: smk_grp

Initially, I though this could be due to a code reversion, but that was a false lead.

I then thought it could reflect metabolites with high numbers of values below the limit of detection (i.e. little meaningful variance), but I tested against glycine (for which this issue does not apply) and still had the same problem.

I am thus forced to conclude that the issue reflects something about the joint distribution of the adjusted and strata variables that we are not quite fully handling.

Ella and Ewy, the data are attached. Let me know if you have any insights. I hope to test again toward the end of today.

Scrambled CPSII data.xlsx

image

steven-moore commented 6 years ago

Possible relevant: frequency table of the three variables that, together, result in problems:

alc_grp bmi_grp smk_grp Frequency Percent

0 | 1 | 0 | 35 | 6.29 0 | 1 | 1 | 40 | 7.19 0 | 1 | 2 | 2 | 0.36 0 | 1 | 3 | 3 | 0.54 0 | 2 | 0 | 38 | 6.83 0 | 2 | 1 | 34 | 6.12 0 | 2 | 2 | 2 | 0.36 0 | 2 | 3 | 5 | 0.9 0 | 3 | 0 | 13 | 2.34 0 | 3 | 1 | 20 | 3.6 0 | 3 | 3 | 2 | 0.36 1 | 0 | 1 | 2 | 0.36 1 | 1 | 0 | 24 | 4.32 1 | 1 | 1 | 37 | 6.65 1 | 1 | 2 | 3 | 0.54 1 | 1 | 3 | 8 | 1.44 1 | 2 | 0 | 31 | 5.58 1 | 2 | 1 | 72 | 12.95 1 | 2 | 2 | 1 | 0.18 1 | 2 | 3 | 9 | 1.62 1 | 3 | 0 | 11 | 1.98 1 | 3 | 1 | 16 | 2.88 1 | 3 | 2 | 1 | 0.18 1 | 3 | 3 | 2 | 0.36 1 | 4 | 0 | 2 | 0.36 2 | 1 | 0 | 8 | 1.44 2 | 1 | 1 | 13 | 2.34 2 | 1 | 3 | 1 | 0.18 2 | 2 | 0 | 14 | 2.52 2 | 2 | 1 | 19 | 3.42 2 | 2 | 3 | 1 | 0.18 2 | 3 | 1 | 4 | 0.72 2 | 3 | 2 | 1 | 0.18 3 | 1 | 0 | 4 | 0.72 3 | 1 | 1 | 20 | 3.6 3 | 1 | 3 | 3 | 0.54 3 | 2 | 0 | 8 | 1.44 3 | 2 | 1 | 23 | 4.14 3 | 2 | 2 | 2 | 0.36 3 | 2 | 3 | 4 | 0.72 3 | 3 | 0 | 1 | 0.18 3 | 3 | 1 | 9 | 1.62 3 | 3 | 2 | 1 | 0.18 3 | 3 | 3 | 1 | 0.18 4 | 0 | 3 | 3 | 0.54 4 | 2 | 1 | 1 | 0.18 4 | 3 | 1 | 2 | 0.36

freqs_temp.xlsx

steven-moore commented 6 years ago

My SAS log shows where the singularity is occurring according to each smoking group, and which variables to drop within each strata.

84 proc corr; 85 var age multivitamin; 86 partial alc1-alc4 bmi1-bmi4; 87 by smk_grp; 88 run;

WARNING: The variable alc4 is singular in calculating the partial PEARSON correlations. WARNING: The variable bmi4 is singular in calculating the partial PEARSON correlations. NOTE: The above message was for the following BY group: smk_grp=0 WARNING: The variable bmi4 is singular in calculating the partial PEARSON correlations. NOTE: The above message was for the following BY group: smk_grp=1 WARNING: The variable bmi3 is singular in calculating the partial PEARSON correlations. WARNING: The variable bmi4 is singular in calculating the partial PEARSON correlations. NOTE: The above message was for the following BY group: smk_grp=3 NOTE: PROCEDURE CORR used (Total process time): real time 0.04 seconds cpu time 0.04 seconds

My hunch is that the algorithm is failing in smk_grp=3, specifically with respect to the bmi3 dummy variable. There are five participants with this BMI level, but the covariate is nonetheless singular in combination with the other covariates.

ellatemprosa commented 6 years ago

here are some findings cpstesting.docx

ellatemprosa commented 6 years ago

@steven-moore can you rerun the sas code with correlation for spearman and post the results? also, can you make it : 84 proc corr spearman; 85 var age ; with glycine; 86 partial alc1-alc4 bmi1-bmi4; 87 by smk_grp; 88 run;

ellatemprosa commented 6 years ago

more testing Browse[2]> ppcor::pcor.test(ck[x],ck[y],ck[z[-1]],method="spearman") estimate p.value statistic n gp Method 1 -0.1486994 0.3797507 -0.8896075 42 5 spearman Browse[2]> ppcor::pcor.test(ck[x],ck[y],ck[z[-2]],method="spearman") estimate p.value statistic n gp Method 1 -0.1486994 0.3797507 -0.8896075 42 5 spearman Browse[2]> ppcor::pcor.test(ck[x],ck[y],ck[z[-3]],method="spearman") estimate p.value statistic n gp Method 1 -0.1486994 0.3797507 -0.8896075 42 5 spearman Browse[2]> ppcor::pcor.test(ck[x],ck[y],ck[z[-4]],method="spearman") Error in solve.default(cvx) : system is computationally singular: reciprocal condition number = 3.13008e-17 Browse[2]> ppcor::pcor.test(ck[x],ck[y],ck[z[-5]],method="spearman") Error in solve.default(cvx) : system is computationally singular: reciprocal condition number = 3.12725e-17 Browse[2]> ppcor::pcor.test(ck[x],ck[y],ck[z[-6]],method="spearman") estimate p.value statistic n gp Method 1 -0.1486994 0.3797507 -0.8896075 42 5 spearman

steven-moore commented 6 years ago

proc corr spearman; var age; with glycine; partial alc1-alc4 bmi1-bmi4; by smk_grp; run;

smk_grp=0:

Spearman Partial Correlation Coefficients, N = 189 Prob > |r| under H0: Partial Rho=0

GLYCINE 0.06510 0.3812

smk_grp=1:

Spearman Partial Correlation Coefficients, N = 312 Prob > |r| under H0: Partial Rho=0

GLYCINE 0.04836 0.4000

smk_grp=3:

Spearman Partial Correlation Coefficients, N = 42 Prob > |r| under H0: Partial Rho=0

GLYCINE -0.12846 0.4553

ellatemprosa commented 6 years ago

hmmm. i am not getting the same values when dropping bmi3 and bmi4 Browse[2]> ppcor::pcor.test(ck[x],ck[y],ck[z[-3]],method="spearman") estimate p.value statistic n gp Method 1 -0.1486994 0.3797507 -0.8896075 42 5 spearman

steven-moore commented 6 years ago

Let's start with the unadjusted comparison where smk_grp=3:

Spearman Correlation Coefficients, N = 42 Prob > |r| under H0: Rho=0

GLYCINE -0.16729 0.2896

ellatemprosa commented 6 years ago

i see, order matters, the checkdesign already kicked out alcgrp2 and we specified bmi first rather than alcgroup, it is the same if done this way: 53 proc corr data=cps spearman; 54 var age; 55 with glycine; 56 partial alc_grp_1 alc_grp_3 alc_grp_4 bmi_grp_1-bmi_grp_3; 57 run;

WARNING: The variable bmi_grp_3 is singular in calculating the partial SPEARMAN correlations. NOTE: PROCEDURE CORR used (Total process time): real time 0.07 seconds cpu time 0.06 seconds

58 proc corr data=cps spearman; 59 var age; 60 with glycine; 61 partial bmi_grp_1-bmi_grp_3 alc_grp_1 alc_grp_3 alc_grp_4; 62 run;

WARNING: The variable alc_grp_4 is singular in calculating the partial SPEARMAN correlations. NOTE: PROCEDURE CORR used (Total process time): real time 0.02 seconds cpu time 0.03 seconds

image

steven-moore commented 6 years ago

Confirmed:

-0.14870 0.3798

ellatemprosa commented 6 years ago

my proposal is to check eigen values and remove those with 0 values, in this case image

steven-moore commented 6 years ago

I will add another example to test

steven-moore commented 6 years ago

Two additional checks:

-linear combinations in correlation matrix (Eigenvalues) -Within some strata, some metabolites have zero variance and need to be filtered out

ellatemprosa commented 6 years ago

in looking deeper into this problem with age 2.5 model, i isolated the problem when the prev_heart_dx=2. in these 18 subjects, there is one that sticks out, with bmi=0 which is not possible. the use of trim.matrix from subselect package on the spearman correlation of the variables picked bmi_grp.1 and will run the correlation but did not seem right because of the data here's the relevant data for this model image

for this example the correction of the bmi_grp to 4 allowed the correlations to run without singularity

steven-moore commented 6 years ago

I agree that BMI=0 is not possible. But bmi_grp.0 is a valid possibility. As you note, we need to get the data fixed. I assume, though, that your fix will handle running the correlation where there is a bmi_grp.0, correct?

ellatemprosa commented 6 years ago

yes, it should, in this data there are others that need to be fixed, only 1 of them should be 0. i will test it now

On Fri, May 18, 2018 at 12:22 PM, Steven Moore notifications@github.com wrote:

I agree that BMI=0 is not possible. But bmi_grp.0 is a valid possibility. As you note, we need to get the data fixed. I assume, though, that your fix will handle running the correlation where there is a bmi_grp.0, correct?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/CBIIT/R-cometsAnalytics/issues/48#issuecomment-390259594, or mute the thread https://github.com/notifications/unsubscribe-auth/ATadGBRHi4Lvn3Sry5uoJcrv52C4C5FVks5tzvVagaJpZM4Ts4Mr .

steven-moore commented 6 years ago

OK, great. Just saw the file that you sent and it looks good. Obviously, we'll need to have them run the non-scrambled version.

steven-moore commented 6 years ago

Were your fixes part of Kailing’s deploy (10 minutes ago)?

S

From: ellatemprosa [mailto:notifications@github.com] Sent: Friday, May 18, 2018 3:55 PM To: CBIIT/R-cometsAnalytics R-cometsAnalytics@noreply.github.com Cc: Moore, Steve (NIH/NCI) [E] steve.moore@nih.gov; Mention mention@noreply.github.com Subject: Re: [CBIIT/R-cometsAnalytics] COMETS 1.3. Problems with some adjustment-strata combos (#48)

Closed #48https://github.com/CBIIT/R-cometsAnalytics/issues/48 via 8a0ea25https://github.com/CBIIT/R-cometsAnalytics/commit/8a0ea2543806d4f9e396f08d7fcbf8a185eab76d.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CBIIT/R-cometsAnalytics/issues/48#event-1635035699, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AhG9TQmqnJ780WSgHrjTTqLiZXLp-dPEks5tzyb8gaJpZM4Ts4Mr.

ellatemprosa commented 6 years ago

i just asked for a deploy to fix the bmi 4.0 and 4.1 problem, we are not gong to run analyses for <25.

On Fri, May 18, 2018 at 3:57 PM, Steven Moore notifications@github.com wrote:

Were your fixes part of Kailing’s deploy (10 minutes ago)?

S

From: ellatemprosa [mailto:notifications@github.com] Sent: Friday, May 18, 2018 3:55 PM To: CBIIT/R-cometsAnalytics R-cometsAnalytics@noreply.github.com Cc: Moore, Steve (NIH/NCI) [E] steve.moore@nih.gov; Mention < mention@noreply.github.com> Subject: Re: [CBIIT/R-cometsAnalytics] COMETS 1.3. Problems with some adjustment-strata combos (#48)

Closed #48https://github.com/CBIIT/R-cometsAnalytics/issues/48 via 8a0ea25https://github.com/CBIIT/R-cometsAnalytics/commit/ 8a0ea2543806d4f9e396f08d7fcbf8a185eab76d.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ CBIIT/R-cometsAnalytics/issues/48#event-1635035699, or mute the thread< https://github.com/notifications/unsubscribe-auth/ AhG9TQmqnJ780WSgHrjTTqLiZXLp-dPEks5tzyb8gaJpZM4Ts4Mr>.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/CBIIT/R-cometsAnalytics/issues/48#issuecomment-390315830, or mute the thread https://github.com/notifications/unsubscribe-auth/ATadGBmMtNmsURRxfEZP393lIJY3FFuVks5tzyeygaJpZM4Ts4Mr .

steven-moore commented 6 years ago

Model Age.2.5 for CPS II women is returning errors

Scrambled.women.CPSII.data_small_models.xlsx

image

steven-moore commented 6 years ago

The comment above actually seems to be more pervasive--the whole system is currently shut down. Once it is back up and running, I will test this model again.

steven-moore commented 6 years ago

Models stratified by heart-disease for the CPS II women still pose a problem. Screenshot is below and file is attached. This dataset has valid values for BMI that align with the instructions in Varmap, so invalid values is not the issue.

image

Scrambled.women.CPSII.data_small_models.xlsx

ellatemprosa commented 6 years ago

That's strange I tested all the models you specified.

On Mon, May 21, 2018, 10:15 AM Steven Moore notifications@github.com wrote:

Reopened #48 https://github.com/CBIIT/R-cometsAnalytics/issues/48.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/CBIIT/R-cometsAnalytics/issues/48#event-1637506945, or mute the thread https://github.com/notifications/unsubscribe-auth/ATadGPc3NNDGM8t-1quP9bqA6E9fw4zfks5t0vYsgaJpZM4Ts4Mr .

ellatemprosa commented 6 years ago

when i tested this i made the min sample size of 35 but lowered to 25, in the strata with problem, the offending metabolite "x - 18307", i think that we can just run all metabolites, there are just some metabolites that will cause singularity and just take as we can approach give that this comes from smaller strata that will provide unreliable estimate

steven-moore commented 6 years ago

We were testing the men's dataset before but this is the women's. I am fine with several possible solutions, but right now, it crashes the "Super-batch", resulting in no output at all.

ellatemprosa commented 6 years ago

in this case, instead of variance in outcomes is 0, we should use near zero variance check as we do for covariates. i will implement this

steven-moore commented 6 years ago

Awesome--thanks!

ellatemprosa commented 6 years ago

i committed the change, @steven-moore can you ask for redeployment from kailing?

ewymathe commented 6 years ago

I was able to reproduce Steve's error. I found another bug in the process that I am fixing. The variance issue at the model level wasn't transferring down to the multiple level model. Did you fix that part? I had a meeting and had to pause... Ewy

2018-05-21 14:38 GMT-04:00 ellatemprosa notifications@github.com:

i committed the change, @steven-moore https://github.com/steven-moore can you ask for redeployment from kailing?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/CBIIT/R-cometsAnalytics/issues/48#issuecomment-390744016, or mute the thread https://github.com/notifications/unsubscribe-auth/AHowQm3P2XEFRBIdTGWHZ7A_giudjpzXks5t0wmygaJpZM4Ts4Mr .

ewymathe commented 6 years ago

I fixed a bug related to this, where the model would error non-sensically if all outcome variables were dropped during model check. This is now fixed.

ellatemprosa commented 6 years ago

running package in rstudio yields results but the one in the website shows NA as results>

ellatemprosa commented 6 years ago

looks like it runs ok but we are returning missing correlations for example image

ellatemprosa commented 6 years ago

@ewymathe i think we want model will not be run? this is per strata or model so it's for a single model. see below image

ewymathe commented 6 years ago

I fixed the errormessage. As for prev_heart_dex=1, we get NaN as well in the package...I'll try to investigate further now...

runCorr(modeldata,exmetabdata,"") [1] "Running analysis on subjects stratified by prev_heart_dx 0" [1] 1458 2 NULL NULL NULL [1] "running unadjusted" [1] "Running analysis on subjects stratified by prev_heart_dx 2" [1] 60 2 NULL NULL NULL [1] "running unadjusted" [1] "Running analysis on subjects stratified by prev_heart_dx 1" [1] 29 2 NULL NULL NULL [1] "running unadjusted" cohort spec model outcomespec exposurespec corr n pvalue 1 Interactive naproxen age -0.03340101 1458 0.2018188 2 Interactive naproxen age -0.11425903 60 0.3688905 3 Interactive naproxen age NaN 29 NaN adjspec adjvars outcome_uid outcome exposure_uid exposure stratavar 1 None None naproxen Naproxen age Age at Entry prev_heart_dx 2 None None naproxen Naproxen age Age at Entry prev_heart_dx 3 None None naproxen Naproxen age Age at Entry prev_heart_dx strata 1 0 2 2 3 1

Ewy

2018-05-23 8:51 GMT-04:00 ellatemprosa notifications@github.com:

@ewymathe https://github.com/ewymathe i think we want model will not be run? see below [image: image] https://user-images.githubusercontent.com/20356376/40425453-69f8d140-5e66-11e8-9f40-265decf589dd.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CBIIT/R-cometsAnalytics/issues/48#issuecomment-391334808, or mute the thread https://github.com/notifications/unsubscribe-auth/AHowQobQNOr5PuurRDRqPFXadmNGZymTks5t1Vs-gaJpZM4Ts4Mr .

ewymathe commented 6 years ago

I'm surprised this didn't throw an error. The outcome here is constent for prev_heart_dx. But all the checks we are doing is on the dummy variables, ..., fiing this now.

ellatemprosa commented 6 years ago

The checks for outcome is at the end. the main validation is on the design matrix (ie the right hand side of the model) but final check on variability in all outcome measures.

ewymathe commented 6 years ago

naproxen fails the variance check for all strata...

steven-moore commented 6 years ago

That may be fine. But if glycine or histidine or other non-drug metabolites fail, then we have a bigger problem

From: Ewy Mathe [mailto:notifications@github.com] Sent: Wednesday, May 23, 2018 12:01 PM To: CBIIT/R-cometsAnalytics R-cometsAnalytics@noreply.github.com Cc: Moore, Steve (NIH/NCI) [E] steve.moore@nih.gov; Mention mention@noreply.github.com Subject: Re: [CBIIT/R-cometsAnalytics] COMETS 1.3. Problems with some adjustment-strata combos (#48)

naproxen fails the variance check for all strata...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CBIIT/R-cometsAnalytics/issues/48#issuecomment-391401990, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AhG9TSKm95pED_vxafr9WA39XOLntSW7ks5t1Ye9gaJpZM4Ts4Mr.

ellatemprosa commented 6 years ago

That's why I'm a little confused that there should not be results for the other 2 strata in the example. I have to track it down. Most of the near zero var metabolites are drug metabolites

ellatemprosa commented 6 years ago

got the mystery solved, the list of outcome vars were not updated after design matrix check, now fixed so that the dropped outcomes do not have entries in the correlation data

steven-moore commented 6 years ago

Great. Should Kailing redeploy?

S

From: ellatemprosa [mailto:notifications@github.com] Sent: Wednesday, May 23, 2018 7:39 PM To: CBIIT/R-cometsAnalytics R-cometsAnalytics@noreply.github.com Cc: Moore, Steve (NIH/NCI) [E] steve.moore@nih.gov; Mention mention@noreply.github.com Subject: Re: [CBIIT/R-cometsAnalytics] COMETS 1.3. Problems with some adjustment-strata combos (#48)

got the mystery solved, the list of outcome vars were not updated after design matrix check, now fixed so that the dropped outcomes do not have entries in the correlation data

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CBIIT/R-cometsAnalytics/issues/48#issuecomment-391537332, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AhG9TdmWFzGaY3dv_umQicCL0DIkUqGMks5t1fMDgaJpZM4Ts4Mr.

steven-moore commented 6 years ago

The proposed Eigenvalue-based solution is performing superbly. I have been unable to crack it despite brute force testing of tens of thousands, perhaps hundreds of thousands of models, in at least three different datasets.

I will ask one or two others to test their datafiles, in case there is something we didn't think of. But as of right now, this is closed. Thanks all for hanging in there on this issue (and our related prior issue). Solving this was a major technical accomplishment and will be important to the conduct of well-coordinated consortium-based analyses.