KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

Propagate contigs not recovered in blastp search to `unclassified.fna` #305

Closed evanroyrees closed 11 months ago

evanroyrees commented 1 year ago

https://github.com/KwanLab/Autometa/issues/296#issuecomment-1251684304

Ah, we should probably sort this out so that we can pass these (contigs without blastp hits) on to unclassified.fna.

This should be a relatively quick fix. I'll look in to the code and get back to you.

Originally posted by @WiscEvan in https://github.com/KwanLab/Autometa/issues/296#issuecomment-1255527922

Here is the section that writes the kingdom fasta files:

https://github.com/KwanLab/Autometa/blob/baf61c04dddf5b33bb825dba2841de1e38dffefe/autometa/taxonomy/vote.py#L381-L394

Any contigs in args.assembly that are not recovered from taxonomy.tsv should be included in unclassified.fna

This can be performed in the write_ranks function:

https://github.com/KwanLab/Autometa/blob/baf61c04dddf5b33bb825dba2841de1e38dffefe/autometa/taxonomy/vote.py#L236-L293

Any contigs from the provided assembly (assembly_records) that are unaccounted for in the taxonomy dataframe should also be placed in unclassified.fna