Open theheking opened 5 years ago
It’s likely that this is an input issue and would be more appropriate on the bioconductor user forum.
What does your readspertx look like?
If it is an input issue, I don't know why I would get a output fasta file for the first sample_01.fasta which is 14G.
I have already removed the zeroes from the readspertx. So I don't think it is that. `fasta_File <- readDNAStringSet(EnsemblfastaFile)
fastaFile_nozero <- replace(width(fasta_File), width(fasta_File) == 0, 1)
readspertx = round(20 * fastaFile_nozero / 100)`
What does the following show:
head(readspertx) sum(readspertx)
head(readspertx) [1] 214 22 831 598 727 96 sum(readspertx) [1] 47182274
If I’m understanding your query and the output you’ve shared with me, the output file size is within the realm of expectation. The input specified generates one file per sample, since paired=F. If I recall correctly, 47 million 100bp reads stored in fasta format should be >10Gb. If you could let me know a bit more about what your goal of the simulation is (such as simulating 20x coverage of every transcript in the transcriptome) I could try and help you tweak your commands.
Hi,
I am running polyester using the following code:
simulate_experiment(EnsemblfastaFile, numreps=c(1,1),fold_changes=fold_changes, reads_per_transcript=readspertx, paired=FALSE, outdir="/mnt/lustre/users/k1632479/polyester/simulatedread", distr="empirical", error_model="illumina5", bias="rnaf")
It runs for hours and outputs a large 8 GB fasta file. I don't understand what I'm running wrong.