cole-trapnell-lab / cufflinks

Boost Software License 1.0
310 stars 116 forks source link

0 ReadHits still live #126

Open lijing28101 opened 4 years ago

lijing28101 commented 4 years ago

Hi, I'm running cufflinks on yeast. I used hisat2 for alignment, combining about 40 RNA-Seq fastq files

hisat2 -p 64 -x Saccharomyces_cerevisiae --dta-cufflinks -1 forward.fastq.gz -2 reverse.fastq.gz > Saccharomyces_cerevisiae_rnaseq.sam
samtools view --threads 64 -b -o Saccharomyces_cerevisiae_rnaseq.bam Saccharomyces_cerevisiae_rnaseq.sam
samtools sort -o Saccharomyces_cerevisiae_rnaseq_sorted.bam -T Saccharomyces_cerevisiae_temp --threads 64 Saccharomyces_cerevisiae_rnaseq.bam

Since the bam file is too large, I splited the bam file by chromosome

samtools view Saccharomyces_cerevisiae_rnaseq_sorted.bam ${chr} -b -@ 32 > chr_${chr}.bam

After that I run cufflinks for each chromosome

cufflinks \
   --output-dir $out \
   --num-threads 32 \
   --verbose \
   --no-update-check \
   $bam

12 chromosomes already finished, but no one have predicted transcripts, many filtering and accepting intron in the output. The last few lines as below

Accepting intron 311719-311805 spanned by 1 reads (0 low overhang, 0.186667 expected) left P = 0.813333, right P = 1
Filtering intron 313897-313926 spanned by 76 reads (74 low overhang, 14.1867 expected) left P = 1, right P = 0
Bad intron table has 833 introns: (12288 alloc'd, 9996 used)
Map has 18423672 hits, 12732782 are non-redundant
Processed 1 loci.
> Map Properties:
>       Normalized Map Mass: 17472709.22
>       Raw Map Mass: 17472709.22
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 235.86
>                  Estimated Std Dev: 106.07
0 ReadHits still live
Found 18 reference contigs
        Total map density: 17472709.215395
[14:39:24] Assembling transcripts and estimating abundances.
III:0-316605    Processing new bundle with 18379608 alignments
Processed 1 loci.

I didn't see any error. Actually, when I run another rna-seq dataset on same genome, only 30 transcripts predicted by cufflinks.