dyisaev / ENIGMA

http://enigma.ini.usc.edu/
MIT License
1 stars 2 forks source link

Filtering by age #15

Closed kellys37 closed 7 years ago

kellys37 commented 7 years ago

When I filter to include adults only in this sample, the results and the Ns are incorrect. The results and Ns after we filter to include adults only should be the same as the results for the overall sample (this sample contains only adults so results for whole sample should= results using adult-only filter). I assume a similar pattern would occur when filtering for adolescents in a sample of adolescents only.

dyisaev commented 7 years ago

Sinead, I assume that you are talking about Model #69 (Line 76 in Google Docs file) - "69. Early-onset (<22) MDD vs later-onset MDD (>=22)_in adults only". There are 2 mistakes in setting the model:

  1. Error 1: New regressors. You refer to the non-existing variable __AO25_groups__ in your new regressors string. (See attached screenshot) screen shot 2017-01-30 at 1 22 24 pm So what happens:
  1. (this is not an error) Filters. You filter by age: (Age>=22). It gives you n.overall = 81 -it's the total amount of lines in your sample after filtering.

  2. Error 2: No values in ContValue, PatValue column. That means that you compare AO21groups==0 with AO21_groups__=1 (see above how these regressors are assigned). From that you get "22" as n.controls and "23" as n.patients.

I believe fixing Errors 1 & 2 will give you correct results. I think as well that there's no need to change anything in the script - situations like this are a good indicator that something's not going right.

Thanks. Let me know if it helps - after you confirm that the problem is fixed, I'll close the issue.

dyisaev commented 7 years ago

Also, when you submit an issue next time - please refer to the google-docs file and the particular line for the model. Otherwise it's hard to figure out what we are talking about :)

kellys37 commented 7 years ago

Hi Dmitry,

Yes, sorry I meant to highlight the model numbers. For example, if you compare results between 14. Early-onset (<22) MDD vs later-onset MDD (>=22) and

  1. Early-onset (<22) MDD vs later-onset MDD (>=22)_in adults only

the results are different (including Ns and p-values). However, they should be the same in this case as the sample is comprised of adults only. Therefore, the results of the models that are taking into account the whole sample (14 above) should be the same, in this case, as the model taking into account adults only (69 above).

Let me know if I have misunderstood anything. I also fixed the error in the new regressors column that you highlighted.

Thanks for your help!

dyisaev commented 7 years ago

Let's try to compare this models as they look now: Model #14: First, let's put ';' between statements. It may not be a problem, but let's just keep the same notation. New regressors: you set AO25_groups=0 if AO<25; AO25_groups=1 if AO>=25; and AO25_groups=2 if Dx=0. Filters: you filter AO25_groups!=2 (thus leaving only AO<25 vs AO>=25. in this case I assume that n.patients=n(AO<25) ; n.controls=n(AO>=25) and n.overall=n.patients + n.controls;

Model #69: New regressors: AO21_groups[AO21_groups < 22] <- 1; AO21_groups[AO21_groups >= 22] <- 2; AO21_groups[Dx == 0] <- 0; More or less the same as Model #14. (except that the age is little bit different). Filters: (AO21_groups!=0) OR (Age>=22). So I here you get all __AO21_groups__=={1,2} (which is the same as all the subjects with Dx!=0).

BUT: you also get all the subjects with Age>=22 (whether they have Dx=0 or Dx=1 - it doesn't matter). <--This is the difference that gives you different n.overall. Moreover: _you still did not set ContValue=1 and PatValue=2. So you are comparing AO21_groups==0 vs AO21_groups=1._ <-- This is why your n.controls + n.patients!=n.overall for the Model #69.

dyisaev commented 7 years ago

One other important point (for future models): Here's one possible source of mistake: You should use original AO variable for assigning values, like that

__AO25_groups__<-__AO__
__AO25_groups__[__AO__ < 25] <- 0
__AO25_groups__[__AO__ >= 25] <- 1

If you use

__AO25_groups__<-__AO__
__AO25_groups__[__AO25_groups__ < 25] <- 0
__AO25_groups__[__AO25_groups__ >= 25] <- 1

then your variable change values on the second line and on the third line you refer to the re-assigned variable for the second re-assignment. This may cause problems, so please never do that.

kellys37 commented 7 years ago

Thanks Dmitry. I changed the filter to (__AO21_groups!=0) & (Age__>=22) and set the contValue and patValue to 1 and 2. Now the Ns and p-values are correct.