alyssafrazee / polyester

Bioconductor package "polyester", devel version. RNA-seq read simulator.
http://biorxiv.org/content/early/2014/12/12/006015
89 stars 51 forks source link

Invalid 'times' argument error while running simulate_experiment #8

Closed ersgupta closed 10 years ago

ersgupta commented 10 years ago

I am trying to simulate rna-seq data for hg19, and simulate_experiment gives the following error. I tried with chr22 which works fine. gtf file was downloaded from ucsc. polyester version: 0.99.0(Packaged: 2014-06-30 11:06:37)

simulate_experiment(gtf=gtfFile, seqpath=fastaFolder, fold_changes = fold_changes, outdir=outdir, transcriptid=tNames, seed=12, num_reps=1) parsing gtf and sequences... done parsing Error in rep(seq_len(length(x)), ...) : invalid 'times' argument In addition: Warning message: In simulate_experiment(gtf = gtfFile, seqpath = fastaFolder, fold_changes = fold_changes, : provided fold changes mean that one group will have more overall expression than the other, so some of the DE signal might be lost due to library size adjustment.

Can you please suggest what might be going wrong here?

alyssafrazee commented 10 years ago

Could you provide a link to the exact gtf file + fasta sequences you are using for this code? Also it would be helpful to see a traceback: after you see this error, could you provide the output from running the command traceback()? Hopefully with this info I will be able to track down any bugs. Thanks!

ersgupta commented 10 years ago

I tried it with "chr22" alone and it gave the same error. chr22 fasta: https://drive.google.com/file/d/0Byxhpip1aqvVRVpfSXc2eWlFUzg/edit?usp=sharing chr22 gtf: https://drive.google.com/file/d/0Byxhpip1aqvVRUZ1ZHNpcVFkdmc/edit?usp=sharing Both files above downloaded from UCSC.

Here is the command and traceback()

simulate_experiment(gtf=gtfFilechr22, seqpath=fastaFolder, fold_changes = fold_changes_chr22, outdir=outdir, transcriptid="chr22", num_reps=1) parsing gtf and sequences... done parsing Error in rep(seq_len(length(x)), ...) : invalid 'times' argument In addition: Warning message: In simulate_experiment(gtf = gtfFilechr22, seqpath = fastaFolder, : provided fold changes mean that one group will have more overall expression than the other, so some of the DE signal might be lost due to library size adjustment. traceback() 5: x[rep(seq_len(length(x)), ...)] 4: x[rep(seq_len(length(x)), ...)] 3: rep(transcripts, times = numreadsList[[i]]) 2: rep(transcripts, times = numreadsList[[i]]) 1: simulate_experiment(gtf = gtfFilechr22, seqpath = fastaFolder, fold_changes = fold_changes_chr22, outdir = outdir, transcriptid = "chr22", num_reps = 1)

ersgupta commented 10 years ago

Found the error, length(fold_changes) was incorrect. One more query that I have is, for transcript fasta i can calculate numtx using count_transcripts but what if I have gtf and ref.fasta? How do I calculate numtx in that case?

alyssafrazee commented 10 years ago

Thanks for the feedback. I'm glad you found the problem with your specific code. To prevent this issue from causing trouble for you and others in the future, I've made two changes:

Hope this helps!