Open fabianjkrueger opened 9 months ago
I encountered a similar bug while preparing query for TCGA-UCEC. To do with TCGAbiolinks:::readSimpleNucleotideVariationMaf call where an empty table leads to incompatible column type. My workaround uses data.table::fread instead ot readr:
query <- GDCquery(
project = "TCGA-UCEC", data.category = "Simple Nucleotide Variation", data.type = "Masked Somatic Mutation",
data.format = "MAF"
)
GDCdownload(query)
# query_results <- GDCprepare(query) # this errors out
files <- file.path(
"GDCdata",
query$results[[1]]$project,
gsub(" ", "_", query$results[[1]]$data_category),
gsub(" ", "_", query$results[[1]]$data_type),
gsub(" ", "_", query$results[[1]]$file_id),
gsub(" ", "_", query$results[[1]]$file_name)
)
maf_data <- do.call(rbind, lapply(files, fread, header = T, skip = "#", sep = "\t"))
TCGAbiolinks v2.32.0, readr v2.1.5, R version 4.4.1 (2024-06-14)
Hello!
There seems to be an issue with preparing of certain data sets for analysis. It's weird, since if works for some of the projects, but it doesn't work for others. One of the projects causing issues here is breast cancer ("BRCA"). I queried and downloaded the data for the different projects in a script like shown below.
All paths are stored in variables, so this is not the issue. This code works for almost all the other cancer types, for example colon adenocarcinoma (project "COAD").
This is the error message I get:
To me, it looks like there is a problem with data types, but I don't know how to fix it.
Is there anything else I might be missing? Are there temporary files that depend on loading a specific library for reading them? If not, there might be a bug.