BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
289 stars 110 forks source link

[HELP] Problems in exporting clinical data in txt #408

Closed ghost closed 4 years ago

ghost commented 4 years ago

Hi everybody! I need some help to figure out way I have problems in exporting a data frame of clinical data in txt. So, basically I download RNA-seq data, then I create an expression matrix and a data frame with the info found with colData. When I try to export the data frame in txt, I get some data that are not displayed in Rstudio, for example in some cases I have rows with data like c(NA, NA) for some conditions, I don't know why. I put some code for example, and a picture of what I get when I try to import in excel (as example).

# Using TCGA biolinks to donwload data and create a co-expression network with WGCNA
library(TCGAbiolinks)
library(SummarizedExperiment)
library(DT)
library(dplyr)

# query for TP PRAD expr:
queryTP <- GDCquery(project = "TCGA-PRAD",
                    legacy = TRUE,
                    data.category = "Gene expression",
                    data.type = "Gene expression quantification",
                    platform = "Illumina HiSeq",
                    file.type = "normalized_results",
                    experimental.strategy = "RNA-Seq",
                    sample.type = "Primary Tumor")
# Download:
GDCdownload(queryTP)
# Prepare data:
PRADexp <- GDCprepare(queryTP, 
                      save = TRUE, 
                      summarizedExperiment = TRUE, 
                      save.filename = "PRADexp.rda")

# Get expression matrix
TPexp <- assay(PRADexp)

# Get sample/clinical/molecular information
TPinfo <- as.data.frame(colData(PRADexp)) 
TPinfo <- TPinfo[,-1] # remove first column

# Write TPinfo in a txt file:
write.table(TPinfo, 
            file = "TPinfo.txt", 
            sep=" ", 
            row.names = TRUE, 
            col.names = TRUE,
            na = "NA",
            quote = FALSE)

Here you see an example of what I see when I try to import in excel (sorry for the italian language, the checkbox is on "space"). As you can see, I find at this point other elements in the columns.

Schermata 2020-05-22 alle 19 41 20

Can someone understand why I get this problem? I thought that maybe some elements could be lists and with the "as.data.frame" command I am not able to coerce them to single elements, but I don't know if there is a way to solve this. Thank you!!!

sannegeraets commented 4 years ago

Hi @mikyzo88 I just came across your issue and I am having the same problem! I see you closed this issue, did you find a solution to this problem? Thanks in advance!

ghost commented 4 years ago

Hi @sannegeraets ! Yes, basically it's like the hypothesis that I wrote in my previous post: some elements of the data frame are in fact lists of vectors or other elements. When i tried to save it in a txt, as you can see in the preview from excel, I got in some columns something like: c(element1, element2, ..., elementN) and this creates problems in the txt. My solution was to find these columns in the data frame and remove them, in this way the txt looks fine! Hope this helps you!