Improvements to codify in the most recent version of eukrhythmic:
[x] MAD clustering parameters: shift defaults coverage/pid to 0.95/0.95 for MAD clustering
[x] MAD clustering parameters: add parameters to config.yaml for mmseqs clustering for pid and coverage
[x] Sample naming in MAD: add project prefix to config.yaml for renaming in MAD
[ ] Abundance filtering: default behavior to run salmon filtering on initial MAD (removal of any contigs with no reads recruited)
[ ] Abundance filtering: add option in config.yaml for user to pass a higher cutoff (e.g. genes with fewer than 10 reads recruiting)
[ ] Software output file: add final output file with version info / environment hashes used in the creation of MAD output. Perhaps add parameters as well? Call it methods-section.txt @shu251 :p
[ ] MAD-info file: add final output info file contianing: number of contigs in MAD (and salmon-filtered MAD), CAGs etc. Add additional info like length distribution and other quast type info?
[x] Merged annotation table: create final merged annotation table that combines emapper + eukulele outputs
[ ] File clean up: Try to reduce total folder size as much as possible. Maintain CAG final assemblies and some other things but remove intermediate mapping and assembly folders.
Additions to the readthedocs:
[ ] Impact of coverage/pid on MAD: include stats / plot from GO-SHIP and others on impact of clustering parameters MAD
[ ] Discussion of salmon:transdecoder interactions?
Improvements to codify in the most recent version of
eukrhythmic
:config.yaml
for mmseqs clustering for pid and coverageconfig.yaml
for renaming in MADconfig.yaml
for user to pass a higher cutoff (e.g. genes with fewer than 10 reads recruiting)Additions to the
readthedocs
: