dcjones / quip

Compressing next-generation sequencing data with extreme prejudice.
http://www.cs.washington.edu/homes/dcjones/quip/
BSD 3-Clause "New" or "Revised" License
78 stars 10 forks source link

option -l #23

Open iochoa opened 10 years ago

iochoa commented 10 years ago

I want to know the compressed size of the reads, but only of the bp's (i.e., without quality values, IDs or other info).

I used the option -l with the output.qp file, and I get the number of reads, the number of bp's, the uncompressed size and the compressed size. Does this compressed size correspond to only the compressed size of the bp's? If not, is there an easy way of getting this info?

thanks a lot!

dcjones commented 10 years ago

Yep! Add the -v option. E.g.

> quip -l h1-1_1.fastq.qp
     Reads         Bases  Uncompressed    Compressed   Ratio  Filename
  26516582    2651658200    6774592517    1080175349  0.1594  h1-1_1.fastq.qp
> quip -l -v h1-1_1.fastq.qp 
     Reads         Bases  ID Uncompressed  ID Compressed  ID Ratio  Aux Uncompressed  Aux Compressed   Aux Ratio  Seq Uncompressed  Seq Compressed  Seq Ratio  Qual Uncompressed  Qual Compressed  Qual Ratio  Filename
  26516582    2651658200       1444759535       83804715    0.0580                 0             485         inf        2651658200       417223371     0.1573         2651658200        579112964      0.2184  h1-1_1.fastq.qp

That separates statistics into Seq, ID, Qual, and Aux (which are the extra fields in SAM/BAM files).

iochoa commented 10 years ago

perfect!!! thank you very much for your fast response, just what I needed! :)