Hi, I have a big human RNA-seq file (~86G , .fastq.gz). So I split the bed12 file by chromosome and then ran flair collapse separately. But it is so slow at the Counting supporting reads for annotated transcripts step. The procedure has been stuck at this step for 5 hours:
I tested on chr1.bed (314M) with 10 CPUs but 4 hours passed since the chr1.annotated_transcripts.alignment.sam generated.
I also tested on chr2.bed. Then I noticed that the size of chr2.annotated_transcripts.alignment.sam is the same as chr1.annotated_transcripts.alignment.sam . Why?
FastA/FastQ files of raw reads is a necessary parameter for flair collapse. But if I just split the bed file by chromosomes rather than input the same raw reads with different chr*.bed, it will not improve the speed.
Below is my codes. And the $root/raw.reads/$sample.fastq.gz is the same with different chr*.bed. I am confused. If I just cut the bed file and not the raw fastq file, it will only reduce the memory, not the speed? I want to increase my speed, what should I do?
Hi, I have a big human RNA-seq file (~86G , .fastq.gz). So I split the bed12 file by chromosome and then ran
flair collapse
separately. But it is so slow at theCounting supporting reads for annotated transcripts
step. The procedure has been stuck at this step for 5 hours:chr1.annotated_transcripts.alignment.sam
generated.chr2.annotated_transcripts.alignment.sam
is the same aschr1.annotated_transcripts.alignment.sam
. Why?FastA/FastQ files of raw reads
is a necessary parameter forflair collapse
. But if I just split the bed file by chromosomes rather than input the same raw reads with different chr*.bed, it will not improve the speed.$root/raw.reads/$sample.fastq.gz
is the same with different chr*.bed. I am confused. If I just cut the bed file and not the raw fastq file, it will only reduce the memory, not the speed? I want to increase my speed, what should I do?