raise upper limit on number of nucleotides that can be in simulated sample

alyssafrazee / polyester

Bioconductor package "polyester", devel version. RNA-seq read simulator.

http://biorxiv.org/content/early/2014/12/12/006015

89 stars 51 forks source link

raise upper limit on number of nucleotides that can be in simulated sample #9

Closed alyssafrazee closed 8 years ago

alyssafrazee commented 10 years ago

In the add_error function, the call to "unlist" means that a max of 2^31 nucleotides total can be simulated in the experiment, which limits the number of reads you can simulate (exact limit depends on read length). Reason: R can only store vectors with fewer than 2^31 entries.

I think it would be good if we could write the code differently so we don't run into the 2^31 limit quite so quickly

alyssafrazee commented 10 years ago

idea: automate this in function call. i.e. if count matrix is too big, serialize the simulations.

shrukane commented 9 years ago

I'm facing the same problem while working with chromosomes having higher transcripts like chr2 etc. I want to know if you have found a solution to this issue ?

alyssafrazee commented 9 years ago

@shrukane -- thanks for commenting. We haven't had time to implement any of our ideas for solutions yet, but we should be able to address this soon! In the meantime, a workaround is to simulate from smaller sections of the chromosome. (e.g., break the fasta or gtf file for chromosome 2 into smaller sub-files, then run the simulate_experiment() function once for each sub-file).

roryk commented 9 years ago

Bummer, it puts an upper limit on the number of reads you can simulate as well.

alyssafrazee commented 9 years ago

Yes, since the number of reads is directly proportional to the number of nucleotides in the simulation. Similar to the solution above, If you need more reads than you can hold in memory you can run the function multiple times.