alyssafrazee / polyester

Bioconductor package "polyester", devel version. RNA-seq read simulator.
http://biorxiv.org/content/early/2014/12/12/006015
89 stars 51 forks source link

simulate_experiment failing with 2 column fold_changes matrix... #48

Closed nickschurch closed 6 years ago

nickschurch commented 6 years ago

I want to simulate some sequence data. I've defined a simple two-column fold_changes matrix, as per the vignette, that looks like this:

      [,1] [,2]
 [1,] 4.00    1
...
[23,] 4.00    1
[24,] 1.00    1
...
[46,] 1.00    1
[47,] 0.67    1
...
[69,] 0.67    1
[70,] 0.50    1
...
[92,] 0.50    1

When I then run simulate_experiment I get:

> simulate_experiment('my.fa', reads_per_transcript=18716.48, num_reps=c(6,6), fold_changes=fold_changes) 
Error in basemeans[, 2] = fold_changes * basemeans[, 1] : 
  number of items to replace is not a multiple of replacement length

When I subset the fold_changes matrix to just have the first column fold_changes=fold_changes[,1] things work fine, but now I'm less sure exactly what is being simulated.

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Biostrings_2.42.1   XVector_0.14.1      IRanges_2.8.2       S4Vectors_0.12.2    BiocGenerics_0.20.0
[6] polyester_1.10.1     

loaded via a namespace (and not attached):
[1] zlibbioc_1.20.0 limma_3.30.13   tools_3.3.1     logspline_2.1.9 knitr_1.15.1   
JMF47 commented 6 years ago

I suspect it is because your reads_per_transcript is of length 1. Im assuming that you want the number of reads per transcript for all those 92 transcripts to be the same 18716.48? If so try reads_per_transcript=matrix(rep(18716.48, 92), ncol=1).

nickschurch commented 6 years ago

You are correct, that's the problem right there. Might be worth clarifying the vignette for this point; it was really unclear to me what reads_per_transcript was expected here.

JMF47 commented 6 years ago

Great, glad to hear your issue was resolved. Thanks for the heads up, will look into this :)