JohnLonginotto / SeQC

SQL databases from BAM/SAM files with a focus on QC.
2 stars 2 forks source link

bam stats F1F2/F1R2/R1F2/R1R2/F2F1/F2R1/R2F1/R2R1 #11

Closed avilella closed 7 years ago

avilella commented 7 years ago

From biostars:

I hate competing with Pierre because I always lose :P but SeQC can do this so I ought to plug it.

Download all the code from https://github.com/JohnLonginotto/SeQC You want to use the TYPE module, which will give a number for all the possible orientations a single or paired end read can align, as a number from 1 to 20. To decode this number, use this chart: http://i.imgur.com/DCMrtwe.jpg

I ran the following:

~/SeQC/SeQC.js.py --analysis TYPE --input file.bam

And got a cigar.out, inline.f and myProject.SeQC binary file.

How do I transform these into a table of counts for each of the types in http://i.imgur.com/DCMrtwe.jpg?

Thx

JohnLonginotto commented 7 years ago

The binary file myProject.SeQC is an SQLite database. At the moment the GUI for SeQC is under re-re-development, but the statistics you want are very easy to get manually.

Run the following in a terminal:

sqlite3 ./myProject.SeQC
.mode columns
.headers on
.schema

make sure to type the dots at the beginning! The final command will print something near the bottom like: CREATE TABLE "633a3b0d8ab37f7cac048fa31d8ca554_TYPE"(TYPE INT, counts INTEGER); except the middle bit, the unique checksum of the file you ran the test on, will be different. So copy this next commmand, but change the 32 characters before the "_TYPE" to whatever you have:

SELECT * FROM "633a3b0d8ab37f7cac048fa31d8ca554_TYPE";

TYPE        counts    
----------  ----------
3           16739116  
4           15311736 

Thus for my sample data, i have 16739116 alignments of type 3, and 15311736 of type 4.

All the best, John