BCCDC-PHL / auto-cpo

Automation of Genomic Analyses for Carbapenemase-Producing Organisms (CPOs)
GNU General Public License v3.0
0 stars 0 forks source link

Add QC filters for hybrid analysis, incorporating metrics from short & long reads #20

Open dfornika opened 1 month ago

dfornika commented 1 month ago

When we perform hybrid analysis we're seeing some samples failing during assembly using plassembler. This often happens when no contigs are generated that are greater than the size provided via the --chromosome_length flag (we currently set the value for that flag to 250000).

We could consider bringing that value down, but I think it might be problematic to set the --chromosome_length value close to the largest "typical" size of a real plasmid that we see in our data.

As an alternative, we can look at the quality and quantity of both short and long-read data to see how it correlates with generating contigs that are greater size than the --chromosome_length value.

In order to implement this we may need to re-structure the way our QC thresholds are currently modeled in the config.json file.