Xinglab / espresso

Other
48 stars 4 forks source link

how are reads filtered? #47

Closed YichaoOU closed 4 months ago

YichaoOU commented 4 months ago

Hello,

My input sam file has 3.7M reads, but the _isoform.tsv or read info in the 0 folder, only have 2.3M reads, I'm wondering why the 1.4M reads are not used (they are all mapped)

samtools view labeled.sam| cut -f 1 | sort | uniq | wc -l
3764194

wc -l _isoform.tsv
2370752 _isoform.tsv

cat 0/*txt | cut -f 1| sort | uniq | wc -l
2370752

Thanks, Yichao

EricKutschera commented 4 months ago

If you use v1.4.0 it will output a summary file from each step. The S step summary file gives counts for the different filters: https://github.com/Xinglab/espresso/blob/v1.4.0/src/ESPRESSO_S.pl#L1489

YichaoOU commented 4 months ago

cool, thanks!