Open AndriesDeKoker opened 10 months ago
Extra comment: when multiple TARGET-projects are loaded (e.g. OS), it is the NBL one that also leads to the failure of adding clinical data of OS. When TARGET-OS is loaded seperatly, there is no issue
Not sure, but was looking in the prepare.R file, could it relate to lines in this part?
`colDataPrepare <- function(barcode){
# We should search what TARGET data means
message("Starting to add information to samples")
ret <- NULL
if(all(grepl("TARGET",barcode))) ret <- colDataPrepareTARGET(barcode)
if(all(grepl("TCGA",barcode))) ret <- colDataPrepareTCGA(barcode)
if(all(grepl("MMRF",barcode))) ret <- colDataPrepareMMRF(barcode)
# How to deal with mixed samples "C3N-02003-01;C3N-02003-021" ?
# Check if this breaks the package
if(any(grepl("C3N-|C3L-",barcode))) {
ret <- data.frame(
sample = map(barcode,.f = function(x) stringr::str_split(x,";") %>% unlist) %>% unlist()
)
}
if(is.null(ret)) {
ret <- data.frame(
sample = barcode %>% unique,
stringsAsFactors = FALSE
)
}
message(" => Add clinical information to samples")
# There is a limitation on the size of the string, so this step will be splited in cases of 100
patient.info <- NULL
patient.info <- splitAPICall(
FUN = getBarcodeInfo,
step = 10,
items = ret$sample
)
if(!is.null(patient.info)) {
ret$sample_submitter_id <- ret$sample %>% as.character()
ret <- left_join(ret %>% as.data.frame, patient.info %>% unique, by = "sample_submitter_id")
}
ret$bcr_patient_barcode <- ret$sample %>% as.character()
ret$sample_submitter_id <- ret$sample %>% as.character()
if(!"project_id" %in% colnames(ret)) {
if("disease_type" %in% colnames(ret)){
aux <- getGDCprojects()[,c(5,7)]
aux <- aux[aux$disease_type == unique(ret$disease_type),2]
ret$project_id <- as.character(aux)
}
}
# There is no subtype info for target, return as it is
if(any(grepl("TCGA",barcode))) {
ret <- addSubtypeInfo(ret)
}
# na.omit should not be here, exceptional case
if(is.null(ret)) {
return(
data.frame(
row.names = barcode,
barcode,
stringsAsFactors = FALSE
)
)
}
# Add purity information from http://www.nature.com/articles/ncomms9971
# purity <- getPurityinfo()
# ret <- merge(ret, purity, by = "sample", all.x = TRUE, sort = FALSE)
# Put data in the right order
ret <- ret[!duplicated(ret$bcr_patient_barcode),]
# This part might not work with multiple projects
idx <- sapply(
X = substr(barcode,1,min(stringr::str_length(ret$bcr_patient_barcode))),
FUN = function(x) {
grep(x,ret$bcr_patient_barcode)
}
)
# the code above does not work, since the projects have different string lengths
if(all(na.omit(ret$project_id) %in% c("TARGET-ALL-P3","TARGET-AML"))) {
idx <- sapply(gsub("-[[:alnum:]]{3}$","",barcode), function(x) {
grep(x,ret$bcr_patient_barcode)
})
}
if(any(ret$project_id == "CPTAC-3",na.rm = T)) {
# only merge mixed samples
mixed_samples <- grep(";",barcode,value = T)
if(length(mixed_samples) > 0){
mixed_samples <- mixed_samples %>% str_split(";") %>% unlist %>% unique
ret_mixed_samples <- ret %>% dplyr::filter(sample_submitter_id %in% mixed_samples) %>%
dplyr::group_by(submitter_id) %>%
dplyr::summarise_all(~trimws(paste(unique(.), collapse = ';'))) %>%
as.data.frame()
ret <- rbind(ret_mixed_samples,ret)
}
idx <- match(barcode,ret$bcr_patient_barcode)
#idx <- sapply(gsub("-[[:alnum:]]{3}$","",barcode), function(x) {
# if(grepl(";",x = x)) x <- stringr::str_split(x[1],";")[[1]][1] # mixed samples
# grep(x,ret$bcr_patient_barcode)
#})
}
if(any(ret$project_id %in% c("CMI-MBC","TARGET-NBL"),na.rm = T)) {
idx <- match(barcode,ret$bcr_patient_barcode)
}
if(is.list(idx)){
stop(
"Prepare will not be possible.
\nIf you are trying to prepare more than
one different project at a time, please do it separately"
)
}
ret <- ret[idx,]
if("barcode" %in% colnames(ret)) ret$barcode <- barcode
rownames(ret) <- barcode
return(ret)
} `
trace('colDataPrepare', edit = T) and removing 'TARGET-NBL' from line if (any(ret$project_id %in% c("CMI-MBC", "TARGET-NBL"), na.rm = T)) {
does the trick
Sorry, I don't have a lot of time to give support anymore. Yes, indeed that is where the issue happens. I just added a small fix.
I still need to test all TARGET-NBL data before the final solution. That if statement was added for a reason, but I am not sure yet which was the case requiring it.
On Wed, Jan 24, 2024 at 10:22 AM AndriesDeKoker @.***> wrote:
trace('colDataPrepare', edit = T) and removing 'TARGET-NBL' from line if (any(ret$project_id %in% c("CMI-MBC", "TARGET-NBL"), na.rm = T)) {
does the trick
— Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/616#issuecomment-1908350021, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQ6IM6L6QSAFINABHEM3YQERJVAVCNFSM6AAAAABBXDYRWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBYGM2TAMBSGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Performing this: query_methyl <- GDCquery( project <- "TARGET-NBL", data.category = 'DNA Methylation', platform = 'Illumina Human Methylation 450', access = 'open', data.type = 'Methylation Beta Value' ) GDCdownload(query_methyl) dna.meth <- GDCprepare(query_methyl, summarizedExperiment = TRUE)
All clinical data is gone, all NA values Is this a bug? Any work-around suggestions?