Stuck in the "getclinical" loop

opened 6 years ago

commented 6 years ago


I am using the below code to retrieve Clinical information with TCGAbiolinks. However, the process goes into an endless loop with project TCGA-LAML when there is no information to be found.


getclinical <- function(proj){
                result = tryCatch({
                        query <- GDCquery(project = proj, data.category = "Clinical")
                        clinical <- GDCprepare_clinic(query, = "patient")
                        for(i in c("admin","radiation","follow_up","drug","new_tumor_event")){
                                aux <- GDCprepare_clinic(query, = i)
                                if(is.null(aux)) next
                                # add suffix manually if it already exists
                                replicated <- which(grep("bcr_patient_barcode",colnames(aux), value = T,invert = T) %in% colnames(clinical))
                                colnames(aux)[replicated] <- paste0(colnames(aux)[replicated],".",i)
                                if(!is.null(aux)) clinical <- merge(clinical,aux,by = "bcr_patient_barcode", all = TRUE)
                        readr::write_csv(clinical,path = paste0(proj,"_clinical_from_XML.csv")) # Save the clinical data into a csv file
                }, error = function(e) {
                        message(paste0("Error Clinical: ", proj))

clinical <- TCGAbiolinks:::getGDCprojects()$project_id %>% regexPipes::grep("TCGA",value=T) %>% sort %>%
plyr::alply(1,getclinical, .progress = "text") %>% rbindlist(fill = TRUE) %>% setDF %>% subset(!duplicated(clinical))

Output message:

o GDCquery: Searching in GDC database
Genome of reference: hg38
oo Accessing GDC. This might take a while...
ooo Project: TCGA-LAML
oo Filtering results
oo Checking data
ooo Check if there are duplicated cases
ooo Check if there results for the query
o Preparing output
Downloading data for project TCGA-LAML
Of the 200 files for download 200 already exist.
All samples have been already downloaded
To get the following information please change the argument
=> new_tumor_events: new_tumor_event
=> drugs: drug
=> follow_ups: follow_up
=> radiations: radiation
No information found
Error clinical: TCGA-LAML
commented 6 years ago

Hi @ycl6 Nice coding, thank you for using our tool. Anyway it seems that you were asking for clinical data and radiation information for LAML (Acute Myeloid Leukemia), according to my knowledge there is no radiation therapy for this liquid tumor, as instead you found for the other 32 solid tumors.

@tiagochst when you have time can you consider this exception? thanks.

commented 6 years ago


The code that I used was taken from the vignettes :)

I investigated a little, it seems most of the auxiliary information would not return anything, i.e. is.null(aux) == TRUE, but for TCGA-LAML, it actually returns something, making aux an empty data.frame.

So I changed

if(is.null(aux)) next


if(is.null(aux) | ( && nrow(aux)==0)) next

This solved the loop problem.