Hi, I have a big RNA-seq sample(.fastq.gz,~84G; .bam, ~61G). After I ran flair correct, I got the corrected.bed file (3.6G). It is >1G so I split the bed file by chromosomes and then I ran flair collapse by chromosomes separately. While the result gtf file is odd. For example, I ran flair collapse on chr1 and then I got the chr1.gtf. But there are transcripts on other chromosomes besides chr1.
for i in {1..22} X Y
do
chr=chr$i
awk -F"\t" '$1=="'$chr'"' $output/$sample/preprocess/${sample}_all_corrected.bed >$output/$sample/preprocess/$chr.bed
done
- Should the `raw fastq` file that input to `flair collapse` be consistent to the `bed` file? I mean, should I extract the reads that appeared in the `bed` file after splitting?
Hi, I have a big RNA-seq sample(.fastq.gz,~84G; .bam, ~61G). After I ran
flair correct
, I got the corrected.bed file (3.6G). It is >1G so I split the bed file by chromosomes and then I ranflair collapse
by chromosomes separately. While the result gtf file is odd. For example, I ranflair collapse
on chr1 and then I got the chr1.gtf. But there are transcripts on other chromosomes besides chr1.3. split bed by chromosome
awk -F"\t" '$1!~"chr" || $1=="chrM"' $output/$sample/preprocess/${sample}_all_corrected.bed >$output/$sample/preprocess/patches.bed
for i in {1..22} X Y do chr=chr$i awk -F"\t" '$1=="'$chr'"' $output/$sample/preprocess/${sample}_all_corrected.bed >$output/$sample/preprocess/$chr.bed done
flair collapse -t 15 \ -q $output/$sample/preprocess/$chr.bed \ -r $root/raw.reads/$sample.fastq.gz -f $anno -g $ref \ --stringent --check_splice --annotation_reliant generate \ -o $output/$sample/$chr
cut -f1 chr1.isoforms.gtf |sort | uniq -c
62102 chr1 345 chr10 690 chr11 484 chr12 158 chr13 423 chr14 343 chr15 409 chr16 586 chr17 125 chr18 618 chr19 626 chr2 326 chr20 149 chr21 225 chr22 567 chr3 306 chr4 429 chr5 459 chr6 486 chr7 336 chr8 364 chr9 37 chrM 379 chrX 11 chrY