⬆️ 🎨 Create autometa-setup-gtdb entrypoint for creating compatible GTDB databases
⬆️ gtdb_to_taxdump is now installed
⬆️ 🎨 Put common functionality for parsing of taxonomic databases in TaxonomyDatabase class

Commands to replicate:

Test GTDB

Running `autometa-setup-gtdb` entrypoint

autometa-setup-gtdb --taxa-files /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonomy_gtdb_releases/R207/bac120_taxonomy_r207.tsv /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonomy_gtdb_releases/R207/ar53_taxonomy_r207.tsv --faa-tarball /media/bigdrive1/Databases/gtdb_proteins_aa_reps_r207.tar.gz --dbdir $HOME/gtdbTest --keep-temp

Running `autometa-taxonomy-lca` entrypoint

autometa-taxonomy-lca --blast /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/78Mbp/78Mbp_gtdb_blastP.tsv --dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/gtdb_to_diamond --dbtype gtdb --sseqid2taxid-output 78Mbp_sseqid2taxid_test.tsv --lca-error-taxids 78Mbp_lcaErrorTaxids_test.tsv --verbose --lca-output 78Mbp_LCAout_test.tsv

Running `autometa-taxonomy-majority-vote` entrypoint

autometa-taxonomy-majority-vote --lca 78Mbp_LCAout_test.tsv --output 78Mbp_gtdb_majority_vote.tsv --dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/gtdb_to_diamond --verbose --dbtype gtdb

Running `autometa-taxonomy` entrypoint

autometa-taxonomy --votes 78Mbp_gtdb_majority_vote.tsv --assembly /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna --output testTaxonomy --split-rank-and-write superkingdom --dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/gtdb_to_diamond --dbtype gtdb

Running `autometa-summary` entrypoint

autometa-binning-summary --binning-main /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.main.tsv --markers /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.markers.tsv --dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/gtdb_to_diamond --dbtype gtdb --output-stats binningSummartStats.tsv --output-taxonomy binningTaxa.tsv --output-metabins metaBins --metagenome /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna

Test NCBI

Running `autometa-taxonomy-lca` entrypoint

autometa-taxonomy-lca --blast /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.blastp.tsv --dbdir /media/bigdrive1/Databases/autometa_databases/ --lca-output lcaOut_ncbi.tsv --sseqid2taxid-output sseqid2taxid_ncbiTest.tsv --lca-error-taxids lcaError_ncbi.tsv --verbose

Running `autometa-taxonomy-majority-vote` entrypoint

autometa-taxonomy-majority-vote --lca lcaOut_ncbi.tsv --output ncbi_majority_vote.tsv --dbdir /media/bigdrive1/Databases/autometa_databases/ --verbose --dbtype ncbi

Running `autometa-taxonomy` entrypoint

autometa-taxonomy --votes ncbi_majority_vote.tsv --assembly /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna --output testNCBI --split-rank-and-write superkingdom --dbdir /media/bigdrive1/Databases/autometa_databases/

Running `autometa-summary` entrypoint

autometa-binning-summary --binning-main /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.main.tsv --markers /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.markers.tsv --metagenome /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna --dbdir  /media/bigdrive1/Databases/autometa_databases/ --output-stats binningSummartStats2.tsv --output-taxonomy binningTaxa2.tsv --output-metabins metaBins2

TODO:

[ ] Update documentation. eg. Add paths to GTDB database files.
[x] Add docstrings for gtdb.py

PR checklist

[X] This comment contains a description of changes (with reason).
[ ] If you've fixed a bug or added code that should be tested, add tests!
[X] Have you followed the pipeline conventions in the contribution docs

`nf-core lint` overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit 7f86c81

+| ✅  62 tests passed       |+
#| ❔  34 tests were ignored |#
!| ❗   9 tests had warnings |!

### :heavy_exclamation_mark: Test warnings: * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README did not have a Nextflow minimum version badge. * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README did not have a Nextflow minimum version mentioned in Quick Start section. * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema `$id` should be `https://raw.githubusercontent.com/autometa/master/nextflow_schema.json` Found `https://raw.githubusercontent.com/autometa/main/nextflow_schema.json` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `plaintext_email` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `custom_config_version` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `custom_config_base` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `hostnames` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `show_hidden_params` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `singularity_pull_docker_container` ### :grey_question: Tests ignored: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/branch.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/ci.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/awstest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/awsfulltest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `assets/nf-core-autometa_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/usage.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/output.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-autometa_logo.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-autometa_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-autometa_logo_dark.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/feature_request.md` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - nextflow_config * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/CONTRIBUTING.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/PULL_REQUEST_TEMPLATE.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `.github/workflows/branch.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/workflows/linting_comment.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/workflows/linting.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `assets/email_template.html` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `assets/email_template.txt` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `assets/nf-core-autometa_logo_light.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `docs/images/nf-core-autometa_logo_light.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `docs/images/nf-core-autometa_logo_dark.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `docs/README.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `lib/NfcoreTemplate.groovy` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.gitignore` or `foo` * [actions_ci](https://nf-co.re/tools-docs/lint_tests/actions_ci.html) - '.github/workflows/ci.yml' not found * [actions_awstest](https://nf-co.re/tools-docs/lint_tests/actions_awstest.html) - 'awstest.yml' workflow not found: `/home/runner/work/Autometa/Autometa/.github/workflows/awstest.yml` * [template_strings](https://nf-co.re/tools-docs/lint_tests/template_strings.html) - template_strings ### :white_check_mark: Tests passed: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitattributes` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitignore` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.markdownlint.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CHANGELOG.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CITATIONS.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow_schema.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/.dockstore.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/CONTRIBUTING.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/PULL_REQUEST_TEMPLATE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting_comment.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.html` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/sendmail_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/modules.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test_full.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/nfcore_external_java_deps.jar` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreSchema.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreTemplate.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/Utils.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowMain.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `main.nf` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/multiqc_config.yaml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/base.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/igenomes.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowAutometa.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `modules.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `Singularity` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `parameters.settings.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `bin/markdown_to_html.r` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `conf/aws.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.travis.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.gitattributes` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.markdownlint.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `CODE_OF_CONDUCT.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/.dockstore.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/config.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/sendmail_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/nfcore_external_java_deps.jar` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/NfcoreSchema.groovy` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/multiqc_config.yaml` matches the template * [pipeline_name_conventions](https://nf-co.re/tools-docs/lint_tests/pipeline_name_conventions.html) - Name adheres to nf-core convention * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema lint passed * [schema_params](https://nf-co.re/tools-docs/lint_tests/schema_params.html) - Schema matched params returned from nextflow config * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: docker_autometa.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: pytest_codecov.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: docker_get_genomes_for_mock.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting_comment.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: docker_mock_data_reporter.yml * [merge_markers](https://nf-co.re/tools-docs/lint_tests/merge_markers.html) - No merge markers found in pipeline files * [modules_json](https://nf-co.re/tools-docs/lint_tests/modules_json.html) - Only installed modules found in `modules.json` ### Run details * nf-core/tools version 2.2 * Run at `2022-11-12 16:49:23`

One thing I noticed while testing the scripts: The NCBI class is instantiated with the variable dbpath while the GTDB class is instantiated with dbdir. Nothing is breaking, but something to keep in mind as we go ahead and maybe make them both consistent.

Codecov Report

Base: 27.40% // Head: 27.35% // Decreases project coverage by -0.05% :warning:

Coverage data is based on head (54c168c) compared to base (86b9550). Patch coverage: 30.30% of modified lines in pull request are covered.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## dev #284 +/- ## ========================================== - Coverage 27.40% 27.35% -0.06% ========================================== Files 48 50 +2 Lines 5469 5733 +264 ========================================== + Hits 1499 1568 +69 - Misses 3970 4165 +195 ``` | Flag | Coverage Δ | | |---|---|---| | unittests | `27.35% <30.30%> (-0.06%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/KwanLab/Autometa/pull/284?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab) | Coverage Δ | | |---|---|---| | [autometa/binning/large\_data\_mode.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvYmlubmluZy9sYXJnZV9kYXRhX21vZGUucHk=) | `0.00% <0.00%> (ø)` | | | [autometa/config/databases.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvY29uZmlnL2RhdGFiYXNlcy5weQ==) | `0.00% <0.00%> (ø)` | | | [autometa/config/utilities.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvY29uZmlnL3V0aWxpdGllcy5weQ==) | `25.00% <ø> (ø)` | | | [autometa/taxonomy/majority\_vote.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvdGF4b25vbXkvbWFqb3JpdHlfdm90ZS5weQ==) | `13.53% <11.53%> (+0.83%)` | :arrow_up: | | [autometa/binning/summary.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvYmlubmluZy9zdW1tYXJ5LnB5) | `41.48% <18.18%> (-0.38%)` | :arrow_down: | | [autometa/taxonomy/gtdb.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvdGF4b25vbXkvZ3RkYi5weQ==) | `21.01% <21.01%> (ø)` | | | [autometa/taxonomy/lca.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvdGF4b25vbXkvbGNhLnB5) | `9.37% <22.22%> (+0.57%)` | :arrow_up: | | [autometa/common/utilities.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvY29tbW9uL3V0aWxpdGllcy5weQ==) | `23.35% <33.33%> (+0.65%)` | :arrow_up: | | [autometa/taxonomy/ncbi.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvdGF4b25vbXkvbmNiaS5weQ==) | `49.51% <35.00%> (-6.92%)` | :arrow_down: | | [autometa/taxonomy/database.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvdGF4b25vbXkvZGF0YWJhc2UucHk=) | `51.85% <51.85%> (ø)` | | | ... and [9 more](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Looking at submitting a bioconda recipe for gtdb_to_taxdump and in the README it suggests using TaxonKit for retrieving stable taxids across GTDB releases (rather than arbitrarily assigning them as noted). Below is a screenshot from the gtdb_to_taxdump summary section:

Next Steps Towards Stability

TaxonKit is available via bioconda so would be easy to include in autometa-env.yml and documentation for generating these taxdump files from the GTDB looks straight-forward as outlined in the GTDB-taxdump repo

TaxonKit conda page: https://anaconda.org/bioconda/taxonkit GTDB-taxdump page (using TaxonKit): https://github.com/shenwei356/gtdb-taxdump

NOTE: TaxonKit v0.12 or greater should be used. (ref)

The taxdump files may also be downloaded directly from the releases page: https://github.com/shenwei356/gtdb-taxdump/releases which reduces the compute requirements.

That being said, if we would like to generate the GTDB taxdump files using this, the steps are outlined here: https://github.com/shenwei356/gtdb-taxdump#steps

Looking in to the future, this is probably the more appropriate route as taxids may be relied up across future GTDB releases.

P.S there is also a related project linked (https://github.com/shenwei356/ictv-taxdump) that generates an NCBI taxdump for viruses and may be useful in the future (@kaw97, is this of any interest to you?)

Not sure if this affects what y'all are doing: https://github.com/shenwei356/gtdb-taxdump/issues/2

Not sure if this affects what y'all are doing: shenwei356/gtdb-taxdump#2

Only merged.dmp and delnodes.dmp are affected. If you just need to use the taxonomy data of the current version, say R207, don't worry.

@WiscEvan The scripts have been updated to use taxonkit's already generated files. I decided to use merged.dmp and delnodes.dmp as they were provided with the taxdump. Although if you think the issue mentioned above by @chasemc would skew the results a LOT we can remove their usage. I'm mentioning the commands for testing:

Setting up the database

The taxdump files needs to downloaded and extracted - link

autometa-config     \
--section databases     \
--option gtdb     \
--value /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonKit/gtdb-taxdump/R207/test
autometa-update-databases --update-gtdb

Run LCA

autometa-taxonomy-lca \
--blast 78mbp_metagenome.blastp.gtdb.tsv --dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonKit/gtdb-taxdump/R207/test \
--dbtype gtdb \
--sseqid2taxid-output 78Mbp_sseqid2taxid_test.tsv \
--lca-error-taxids 78Mbp_lcaErrorTaxids_test.tsv \
--verbose \
--lca-output 78Mbp_LCAout_test.tsv

Run Majority vote

autometa-taxonomy-majority-vote \
--lca 78Mbp_LCAout_test.tsv \
--output 78Mbp_gtdb_majority_vote.tsv \
--dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonKit/gtdb-taxdump/R207/test \ 
--verbose \
--dbtype gtdb

Run taxonomy

autometa-taxonomy \
--votes 78Mbp_gtdb_majority_vote.tsv \
--assembly /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna \
--output testTaxonomy \
--split-rank-and-write superkingdom \
--dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonKit/gtdb-taxdump/R207/test \
--dbtype gtdb

Run summary

autometa-binning-summary \
--binning-main 78_binningMain_gtdb.tsv \
--markers /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.markers.tsv \
--dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonKit/gtdb-taxdump/R207/test \
--dbtype gtdb \
--output-stats binningSummartStats.tsv --output-taxonomy binningTaxa.tsv \
--output-metabins metaBins \
--metagenome /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna

By using this we don't need to add any additional dependency as well. Let me know what you think.

The tests and CI/CD still need to be resolved, but I think we're almost there.

Addressed all the comments.

🐛 🛠️ I'm not sure how you encountered the error below. Sam and I implemented a set -x / { set +x; } 2>/dev/null routine before and after running each module (https://github.com/KwanLab/Autometa/blob/gtdb_to_autometa/workflows/autometa_flagged.sh) that should allow easier inspection of a user's parameter configurations without them sending the entire submit file.

Still getting the following error when running autometa-large-data-mode-gtdb.sh. I'm getting the error during the large-data-mode binning (lines).

The set -x / { set +x; } 2>/dev/null routine has only been set for autometa_flagged.sh and no other script. I'm running the large-data-mode workflow which does not have the above stated routine. The bug seems to be with the large-data-mode implementation as all the other workflows are running fine.

[10/16/2022 04:41:27 PM DEBUG] autometa.common.kmers: umap: 10 data points and 10 dimensions
[10/16/2022 04:41:27 PM DEBUG] autometa.common.kmers: Performing embedding with umap (seed 42)
/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/umap/umap_.py:2344: UserWarning: n_neighbors is larger than the dataset size; truncating to X.shape[0] - 1
  warn(
Traceback (most recent call last):
  File "/home/sidd/miniconda3/envs/autometa_aims/bin/autometa-binning-ldm", line 33, in <module>
    sys.exit(load_entry_point('Autometa==2.1.0', 'console_scripts', 'autometa-binning-ldm')())
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/Autometa-2.1.0-py3.9.egg/autometa/binning/large_data_mode.py", line 831, in main
    main_out = cluster_by_taxon_partitioning(
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/Autometa-2.1.0-py3.9.egg/autometa/binning/large_data_mode.py", line 441, in cluster_by_taxon_partitioning
    rank_embedding = get_kmer_embedding(
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/Autometa-2.1.0-py3.9.egg/autometa/binning/large_data_mode.py", line 112, in get_kmer_embedding
    embedding.to_csv(cache_fpath, sep="\t", index=True, header=True)
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/pandas/core/generic.py", line 3551, in to_csv
    return DataFrameRenderer(formatter).to_csv(
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1180, in to_csv
    csv_formatter.save()
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 241, in save
    with get_handle(
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/pandas/io/common.py", line 694, in get_handle
    check_parent_directory(str(handle))
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/pandas/io/common.py", line 568, in check_parent_directory
    raise OSError(rf"Cannot save file into a non-existent directory: '{parent}'")
OSError: Cannot save file into a non-existent directory: '/media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonkit/78mbp_metagenome_Autometa_Output3/78mbp_metagenome_bacteria_cache/species'

KwanLab / Autometa

⬆️ 🎨 Allow the use of gtdb taxonomy in Autometa #284

Test GTDB

Running `autometa-setup-gtdb` entrypoint

Running `autometa-taxonomy-lca` entrypoint

Running `autometa-taxonomy-majority-vote` entrypoint

Running `autometa-taxonomy` entrypoint

Running `autometa-summary` entrypoint

Test NCBI

Running `autometa-taxonomy-lca` entrypoint

Running `autometa-taxonomy-majority-vote` entrypoint

Running `autometa-taxonomy` entrypoint

Running `autometa-summary` entrypoint

TODO:

PR checklist

`nf-core lint` overall result: Passed :white_check_mark: :warning:

Codecov Report

Next Steps Towards Stability

Setting up the database

Run LCA

Run Majority vote

Run taxonomy

Run summary

KwanLab / Autometa

⬆️ 🎨 Allow the use of gtdb taxonomy in Autometa #284

Test GTDB

Running autometa-setup-gtdb entrypoint

Running autometa-taxonomy-lca entrypoint

Running autometa-taxonomy-majority-vote entrypoint

Running autometa-taxonomy entrypoint

Running autometa-summary entrypoint

Test NCBI

Running autometa-taxonomy-lca entrypoint

Running autometa-taxonomy-majority-vote entrypoint

Running autometa-taxonomy entrypoint

Running autometa-summary entrypoint

TODO:

PR checklist

nf-core lint overall result: Passed :white_check_mark: :warning:

Codecov Report

Next Steps Towards Stability

Setting up the database

Run LCA

Run Majority vote

Run taxonomy

Run summary

Running `autometa-setup-gtdb` entrypoint

Running `autometa-taxonomy-lca` entrypoint

Running `autometa-taxonomy-majority-vote` entrypoint

Running `autometa-taxonomy` entrypoint

Running `autometa-summary` entrypoint

Running `autometa-taxonomy-lca` entrypoint

Running `autometa-taxonomy-majority-vote` entrypoint

Running `autometa-taxonomy` entrypoint

Running `autometa-summary` entrypoint

`nf-core lint` overall result: Passed :white_check_mark: :warning: