ABCD-STUDY / analysis-nda

Collection of scripts to analyze ABCD release data
42 stars 36 forks source link

renaming variables names from NDA to ABCD #8

Closed ghost closed 5 years ago

ghost commented 5 years ago

There's a step in your "merge" code where you pull in a csv file to switch the variable labels from NDA to ABCD format. That works fine, but when you get to the recoding demographics (i.e., switching education stuff), the variable labels used are NDA. Is that intentional? I'd like to follow the code all the way through without editing it for the sake of replicability, but I'm not sure why these two things are inconsistent. I discovered it when looking at household income, which is listed as demo_fam_comb_income_v2b in our cleaned dataset, but demo_fam_comb_income_v2 in the demographic code.

HaukeBartsch commented 5 years ago

Maybe this is related to the order in which the three processing scripts are run? I think I changed that order from 1.0 to 1.1 and I might not have updated the documentation. The three scripts are merge_data.md, core_demographic.md and categorical_extension.md. I think the merge_data script directly points to the categorical_extension script instead of going first to core_demographics. Please let me know if that is related or if I should investigate in more detail.

ghost commented 5 years ago

Hi Hauke, I think it is a problem regardless of the ordering, because the switching from NDA to ABCD happens in the first merge script here: alia = read.csv('NDA_DEAP_names_1.1.csv') tables = list() for (p in 1:length(input_list)) { print(p) input = input_list[p] print(paste("import: ", input, " [", p, "/",length(input_list), "]", sep=""))

# read data from the tab-separated files as characters
dt = read.table(file = input, sep = '\t',header = TRUE)

*# replace variable names from nda with their alias names to make them more like ABCD
instrument = sub('\\.txt$', '', basename(input_list[p]))
ali  = alia[which(alia$instrument == instrument),]
nn = names(dt)
for (q in 1:length(nn)) {
    if (nn[q] %in% ali$nda) {
        colnames(dt)[q] <- as.character(ali$abcd[ali$nda == nn[q]])
    }
}

tables[[p]] = dt**

}

So assuming you run the whole merge script through, when you move on to core demographics you're now working with ABCD variables while the core demo script refers to NDA variables. Maybe I'm incorrect though?

HaukeBartsch commented 5 years ago

No, no, makes sense... I see the problem. I don't get this error on my end - maybe because I am running the R code of both scripts in the same R process. I don't read the Rds in a second time for the core demographics. It seems as if the old variables names are still working, they are not removed in R's data frame even after the renaming using colnames. Maybe that is the reason that I can still run core_demographics using the old names? What do you think?

ghost commented 5 years ago

Maybe? When I look at the column names of the dataset created after the merge script is done, I don’t have the original column names anymore so I thought it was just overwriting. Frankly the code is a bit above my head, so you would better know if the code you wrote overwrites versus creates a new columns. I have been saving and reloading each of the data files sequentially as I run through the code so I don’t have to duplicate something if it errors out. I could try to collapse all of the scripts together and run them in the same process and see. On Thu, Jan 17, 2019 at 2:53 PM Hauke Bartsch notifications@github.com wrote:

No, no, makes sense... I see the problem. I don't get this error on my end

  • maybe because I am running the R code of both scripts in the same R process. I don't read the Rds in a second time for the core demographics. It seems as if the old variables names are still working, they are not removed in R's data frame even after the renaming using colnames. Maybe that is the reason that I can still run core_demographics using the old names? What do you think?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ABCD-STUDY/analysis-nda17/issues/8#issuecomment-455309620, or mute the thread https://github.com/notifications/unsubscribe-auth/ApwlhpfAwuV_xGoP_1I8R2vQEoK9G9Mxks5vENTCgaJpZM4aGh1F .

-- Rebecca Umbach Sent from my IPhone

HaukeBartsch commented 5 years ago

Hi Rebecca,

Yep, using colnames you will get the new names - but if you just use nda17$ it will still work :-). I need to update the script to use the new name... But I am really in the next release 2.0 hubbub right now not sure if I can do it today?

Thanks a lot for reaching out!

ghost commented 5 years ago

Hmm, ok, I'll try to troubleshoot it myself. I did want to check that you say to go merge-->demo-->categorical. That's correct? When I run demo immediately after merge (after changing the variable names to match the new ABCD ones), I errored out since the income, for example, is still in categories of $5000 and below, etc. Did you ever run into this problem? And no problem, I know you're busy. Just trying to figure out how to replicate Thompson's results! Rebecca Umbach, PhD


(413) 313 9901 Developmental Affective Neuroscience Lab Columbia University

Behavioral Sciences Training Program in Drug Abuse Research

New York University

On Thu, Jan 17, 2019 at 3:10 PM Hauke Bartsch notifications@github.com wrote:

Hi Rebecca,

Yep, using colnames you will get the new names - but if you just use nda17$ it will still work :-). I need to update the script to use the new name... But I am really in the next release 2.0 hubbub right now not sure if I can do it today?

Thanks a lot for reaching out!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ABCD-STUDY/analysis-nda17/issues/8#issuecomment-455315120, or mute the thread https://github.com/notifications/unsubscribe-auth/ApwlhoiNtmak8aJzlC-jpIkqsuxu4YK_ks5vENjRgaJpZM4aGh1F .