bcgsc / tigmint

β›“ Correct misassemblies using linked AND long reads
https://bcgsc.github.io/tigmint/
GNU General Public License v3.0
54 stars 13 forks source link

Filter alignments in a single step #2

Closed abmudd closed 6 years ago

abmudd commented 7 years ago

I have two small notes:

  1. This is a bit aesthetic, but in the README, you might want to list the sample draft assembly as "myassembly.fa" and the sample reads as "myreads.fq.gz" in order to better match the sample usage commands.

  2. The pipeline only uses the %.as100.bam to filter and output the %.nm$(nm).bam (lines 144-161 of tigmint-make). Therefore, couldn't you combine these two steps either by piping the alignment score gawk command into the mismatches gawk command or by combining the two gawk commands?

sjackman commented 7 years ago

I've fixed 1. I will fix 2. Thanks for both suggestions, @abmudd!

sjackman commented 7 years ago

awk is quite slow. Do you know of a faster tool to filter a SAM or BAM file based on AS and NM?

sjackman commented 7 years ago

Did you get Tigmint running? Any trouble with installation?

abmudd commented 7 years ago

I don't know if pysam would be any faster than gawk. You can run the gawk command as a single line though:

gawk -v ascore=$ASCORE -v mismatch=$MISMATCH -F'\t' ' BEGIN { print "Flags\tRname\tPos\tMapq\tAS\tNM\tBX\tMI" } { as = bx = mi = nm = "NA" } match($$0, "AS:.:([^\t]*)", x) { as = x[1] } match($$0, "NM:.:([^\t]*)", x) { nm = x[1] } match($$0, "BX:Z:([^\t]*)", x) { bx = x[1] } match($$0, "MI:i:([^\t]*)", x) { mi = x[1] } { if (as >= ascore && nm <= mismatch) {print $2 "\t" $3 "\t" $4 "\t" $5 "\t" as "\t" nm "\t" bx "\t" mi }}'

Unfortunately, I have been unable to get tigmint running due to the large number of dependencies and conflicts within my system.

sjackman commented 7 years ago

I'll merge the gawk steps, hopefully next week. I'll experiment with finding faster options, and try out pysam.

Sorry to hear of your troubles installing Tigmint. It does have a lot of dependencies. Have you tried using Linuxbrew to install the dependencies? 🍺🐧 http://linuxbrew.sh Tigmint has a Brewfile: https://github.com/bcgsc/tigmint/blob/master/Brewfile I'll create a Docker image.

sjackman commented 7 years ago

I've created a Docker image bcgsc/tigmint with all the dependencies installed. https://hub.docker.com/r/bcgsc/tigmint/ https://github.com/bcgsc/tigmint/blob/master/Dockerfile

docker run -it --name=tigmint bcgsc/tigmint
tigmint-make help
sjackman commented 6 years ago

Alignments are now filtered (in one step) by tigmint-molecule, which takes a SAM/BAM file of reads aligned to the reference (sorted by barcode with samtools -t BX) and outputs a BED file of molecules.