PNNL-CompBio / Snekmer

Pipeline to apply encoded Kmer analysis to protein sequences
BSD 3-Clause "New" or "Revised" License
12 stars 1 forks source link

Enable background subtraction / file unzipping #118

Open christinehc opened 9 months ago

christinehc commented 9 months ago

Description

This PR (1) enables automatic gzipped file detection and unzipping as part of the main Snekmer workflow, and (2) overhauls the integration of background files into Snekmer workflows such that background files can be supplied to Snekmer and the kmer profile of background sequences be used to inform the probability of kmers appearing in a given family vs. a background, thus affecting downstream models. For (2), a parallel workflow is enabled in Snekmer that processes background files and sums the kmer profiles observed across the background for integration into the scoring and modeling steps. See the full changelog for details.

Issues

Full Changelog

christinehc commented 9 months ago

TODO: