alyssafrazee / polyester

Bioconductor package "polyester", devel version. RNA-seq read simulator.
http://biorxiv.org/content/early/2014/12/12/006015
89 stars 51 forks source link

Distribution of reads #27

Closed shrukane closed 8 years ago

shrukane commented 8 years ago

Is there any way we can plot the distribution of reads in the simulation ?

JMF47 commented 8 years ago

Hey Shruti, there is currently no method for that. But I will write one in and incorporate it into the code. This is a pretty useful feature. Would you just like to have the number of reads that cover each base of the reference sequences that you submitted for simulation from?

shrukane commented 8 years ago

That would be great ! Thank you.

shrukane commented 8 years ago

Hi, any updates ?

JMF47 commented 8 years ago

Hey! Sorry I was on vacation, but I have written the code this morning to the devel version to polyester. Please try it out and see if it does what you want it to do. To start polyester in devel please do:

library(devtools)
dev_mode()
install_github("alyssafrazee/polyester")
library(polyester)

Then you can run the simulation as you normally do, except modify the simulate_experiment command slightly by inserting the argument simulate_experiment(..., reportCoverage=T)

This will create a sample_coverages.rda file in the same directory where you have generated the reads. In this file, is stored a list of matrices. The names of each entry in the list corresponds to the transcript from which reads are generated. Each entry in the list is a matrix for that transcript. The i-th column of the matrix corresponds to i-th dataset generated from polyester, while the j-th row corresponds to the number of reads that will map to that position along that transcript (theoretical prefect coverage before errors are introduced during simulation).

Let me know if you need any more help!

This is sample code:

library(devtools)
dev_mode(on=T)
install_github("alyssafrazee/polyester")
library(polyester)
library(Biostrings)
# FASTA annotation
fasta_file = system.file('extdata', 'chr22.fa', package='polyester')
fasta = readDNAStringSet(fasta_file)
# subset the FASTA file to first 20 transcripts
small_fasta = fasta[1:20]
writeXStringSet(small_fasta, 'chr22_small.fa')
# here all transcripts will have ~equal FPKM
readspertx = round(20 * width(small_fasta) / 100)

# simulation call:
simulate_experiment('chr22_small.fa', reads_per_transcript=readspertx, num_reps=10, 
fold_changes=1, outdir='simulated_reads', reportCoverage=T) 

Look for the sample_coverages.rda file in the same directory where the sample_*.fasta files are.

shrukane commented 8 years ago

hi, sorry for the late reply. when I run my code I get the following error Error in save(coverage_matrices, file = paste0(outdir, "/sample_coverages.rda")) : object ‘coverage_matrices’ not found Any suggestions ?

shrukane commented 8 years ago

Hi, I know this is a closed call but I have one more question. Is this method incorporated in the latest version of polyester ? I've been working on the latest version and i could not find the sample_coverages.rda file. -Shruti