dahak-metagenomics / dahak

benchmarking and containerization of tools for analysis of complex non-clinical metagenomes.
https://dahak-metagenomics.github.io/dahak
BSD 3-Clause "New" or "Revised" License
21 stars 4 forks source link

code for parsing and visualizing abricate results #125

Open stephenturner opened 6 years ago

stephenturner commented 6 years ago

files:

kternus commented 6 years ago

I thought this code was helpful to visualize the ABRicate results. In particular, @stephenturner implemented a way to get the consensus coverage information for genes with overlapping intervals in the ABRicate output, and he parsed the files by dataset, assembler, and trim value. This might be a future idea for visualizing results in the antibiotic resistance Jupyter notebook, or it could be a standalone script for data visualization. The dotted line shows the 90% coverage threshold, which is the default for SRST2 gene detection.

stephenturner commented 6 years ago

not sure what kind of capability python has for doing this. i had to treat the coverage intervals as genomic ranges and use plyranges (a dplyr-like interface for manipulating ranges) to do a grouped interval reduction.

kternus commented 6 years ago

I don't know either, but I appreciate you sharing this! It looks much better than what I was trying to sketch out as a visualization idea.

stephenturner commented 6 years ago

Preview the HTML file compiled: https://htmlpreview.github.io/?https://github.com/dahak-metagenomics/dahak/blob/0ebfa8b63904dddab907469774e2986bb6135180/scripts/abricate-results-parse-viz.html