Bioconductor / copy-number-analysis

Explore, compare, and evaluate Bioconductor packages related to genomic copy number analysis
21 stars 12 forks source link

'Error in as.vector(x, "character")' - RCNV_seq.R and RCNV_seq-helper.R #9

Closed amora197 closed 2 years ago

amora197 commented 3 years ago

Hello,

I'm currently trying to replicate the example code for getting a CNV plot, using both the RCNV_seq.R and RCNV_seq-helper.R scripts with the tumor and normal bam files of chromosome 4 that were provided for testing. My goal is to use this tool to visualize genomic data of crops and detect CNVs.

While using RCNV_seq.R, I would run cnv <- cnv.cal(as.countsfile(chr4), log2=log2, annotate=annotate) and get the following error:

asCountsFileError

I noticed that the as.countsfile( ) function was defined in the RCNV_seq-helper.R file. After some troubleshooting, I understood that the chr4 object in RCNV_seq.R is an S4 object that is been tried to be converted into a tabulated file using the as.countsfile( ) function. This tab file is in turn needed as a parameter/input in the cnv.cal( ) function from the cnv.R file.

In order to get around this error and replicate the CNV plot, I had to edit the as.countsfile( ) function in RCNV_seq-helper.R as follows:

Original as.countsfile( ) code:

as.countsfile <- function(hits, file=tempfile()) {
    df <- with(rowData(hits), {
        cbind(data.frame(chromosome=as.character(seqnames),
                         start=start, end=end),
              assay(hits))
    })
    write.table(df, file, quote=FALSE, row.names=FALSE, sep="\t")
    file
}

Edited as.countsfile( ) code:

as.countsfile <- function(hits, fileNameAndLocation) {
    df <- with(rowData(hits), {
        chrome = hits@rowRanges@seqnames
        start = hits@rowRanges@ranges
        end = hits@rowRanges@ranges@start + hits@rowRanges@ranges@width
            cbind(data.frame(chromosome=chrome,
                             start=start,
                             end=end),
                             assay(hits))
    })

    df = subset(df, select = -c(end))
    names(df)[names(df) == "start.start"] <- "start"
    names(df)[names(df) == "start.end"] <- "end"
    names(df)[names(df) == "start.width"] <- "width"

    write.table(df, file = fileNameAndLocation, quote=FALSE, row.names=FALSE, sep="\t")
    return(fileNameAndLocation)
}

With the custom edits, I was able to replicate the CNV plot, but my wish is to not use a custom-edited R script of an established tool when it comes to publishing. I am fairly new to R and do not know how I may be affecting the functionality/robustness of the tool. This leads to my two requests:

  1. Can the as.countsfile( ) function in RCNV_seq-helper.R file be fixed to handle the extraction from S4 objects and convert the data into a tab file?
  2. What would be the proper way to cite this tool in a publication?

Thank you ahead of time for your help. Have a good one!

vjcitn commented 3 years ago

Thanks for your comment. These suggestions are well-motivated, but you may note that the workflow source code has not changed in 7 years. I think the citation to this github repo is probably the best one can do for this code base. You might post the question about plotting CNV data to support.bioconductor.org, as there might be a more recently maintained package that addresses this task, that would be easier to cite.

vjcitn commented 3 years ago

Any need for further work here @amora197 ? I am sorry that working on this workflow is out of scope at the moment but if you have an urgent need perhaps more could be done.