Closed ersgupta closed 10 years ago
Could you provide a link to the exact gtf file + fasta sequences you are using for this code? Also it would be helpful to see a traceback: after you see this error, could you provide the output from running the command traceback()
? Hopefully with this info I will be able to track down any bugs. Thanks!
I tried it with "chr22" alone and it gave the same error. chr22 fasta: https://drive.google.com/file/d/0Byxhpip1aqvVRVpfSXc2eWlFUzg/edit?usp=sharing chr22 gtf: https://drive.google.com/file/d/0Byxhpip1aqvVRUZ1ZHNpcVFkdmc/edit?usp=sharing Both files above downloaded from UCSC.
Here is the command and traceback()
simulate_experiment(gtf=gtfFilechr22, seqpath=fastaFolder, fold_changes = fold_changes_chr22, outdir=outdir, transcriptid="chr22", num_reps=1) parsing gtf and sequences... done parsing Error in rep(seq_len(length(x)), ...) : invalid 'times' argument In addition: Warning message: In simulate_experiment(gtf = gtfFilechr22, seqpath = fastaFolder, : provided fold changes mean that one group will have more overall expression than the other, so some of the DE signal might be lost due to library size adjustment. traceback() 5: x[rep(seq_len(length(x)), ...)] 4: x[rep(seq_len(length(x)), ...)] 3: rep(transcripts, times = numreadsList[[i]]) 2: rep(transcripts, times = numreadsList[[i]]) 1: simulate_experiment(gtf = gtfFilechr22, seqpath = fastaFolder, fold_changes = fold_changes_chr22, outdir = outdir, transcriptid = "chr22", num_reps = 1)
Found the error, length(fold_changes) was incorrect. One more query that I have is, for transcript fasta i can calculate numtx using count_transcripts but what if I have gtf and ref.fasta? How do I calculate numtx in that case?
Thanks for the feedback. I'm glad you found the problem with your specific code. To prevent this issue from causing trouble for you and others in the future, I've made two changes:
simulate_experiment
so that a more explicit error is thrown if fold_changes
has the wrong lengthcount_transcripts
function was updated so that it can also count the transcripts in a gtf file. Hope this helps!
I am trying to simulate rna-seq data for hg19, and simulate_experiment gives the following error. I tried with chr22 which works fine. gtf file was downloaded from ucsc. polyester version: 0.99.0(Packaged: 2014-06-30 11:06:37)
simulate_experiment(gtf=gtfFile, seqpath=fastaFolder, fold_changes = fold_changes, outdir=outdir, transcriptid=tNames, seed=12, num_reps=1) parsing gtf and sequences... done parsing Error in rep(seq_len(length(x)), ...) : invalid 'times' argument In addition: Warning message: In simulate_experiment(gtf = gtfFile, seqpath = fastaFolder, fold_changes = fold_changes, : provided fold changes mean that one group will have more overall expression than the other, so some of the DE signal might be lost due to library size adjustment.
Can you please suggest what might be going wrong here?