BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
289 stars 110 forks source link

Error in strsplit(c(colnames(data)), "-") : non-character argument #459

Open kalyanidhusia opened 3 years ago

kalyanidhusia commented 3 years ago

Hi, I am working with the new CMI-MBC data and was using TCGABiolink for DEGs identification.. Now at step

dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,samplesNT],
+                             mat2 = dataFilt[,samplesTP],
+                             Cond1type = "Primary Tumor",
+                             Cond2type = "Metastatic",
+                             pipeline = "edgeR",
+                             fdr.cut = 0.01 ,
+                             logFC.cut = 1,
+                             method = "glmLRT")

I get the error stating Error in strsplit(c(colnames(data)), "-") : non-character argument.

I guessed its because of the names of samples MBCProject_0065_T1_RNA_1 and not MBCProject-0065-T1-RNA-1.

Following the logic I tried correcting the sample names by replacing _ with -, but that doesn't help very much.

Any idea what should I do next?

tiagochst commented 2 years ago

The function was mainly created for TCGA data. There are certain functions that will only work with TCGA data. You could still run the function as below:

library(TCGAbiolinks)
project <- c('TCGA-PAAD', 'HCMI-CMDC')
clin <- GDCquery_clinic(project, "clinical", save.csv = T)

clin <- GDCquery_clinic(project[1], "clinical", save.csv = T)

proj <- "CMI-MBC"
query <- GDCquery(
    project = proj,
    data.category = "Transcriptome Profiling",
    data.type = "Gene Expression Quantification", 
    workflow.type = "HTSeq - Counts"
)
GDCdownload(query)
data <- GDCprepare(query)

dataPrep <- TCGAanalyze_Preprocessing(
    object = data, 
    cor.cut = 0.6,
    datatype = "HTSeq - Counts"
)                      

dataNorm <- TCGAanalyze_Normalization(
    tabDF = data,
    geneInfo = geneInfoHT,
    method = "gcContent"
) 
dataFilt <- TCGAanalyze_Filtering(
    tabDF = dataNorm,
    method = "quantile", 
    qnt.cut =  0.25
)   

dataDEGs <- TCGAanalyze_DEA(
    mat1 = dataFilt[,which(data$sample_type == "Primary Tumor")],
    mat2 = dataFilt[,which(data$sample_type == "Metastatic")],
    Cond1type = "Primary Tumor",
    Cond2type = "Metastatic",
    fdr.cut = 0.01 ,
    logFC.cut = 1,
    method = "glmLRT",
    metadata = FALSE
)