KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

⬆️ 🎨 Allow the use of gtdb taxonomy in Autometa #284

Closed Sidduppal closed 1 year ago

Sidduppal commented 2 years ago

Commands to replicate:

Test GTDB

Running autometa-setup-gtdb entrypoint

autometa-setup-gtdb --taxa-files /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonomy_gtdb_releases/R207/bac120_taxonomy_r207.tsv /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonomy_gtdb_releases/R207/ar53_taxonomy_r207.tsv --faa-tarball /media/bigdrive1/Databases/gtdb_proteins_aa_reps_r207.tar.gz --dbdir $HOME/gtdbTest --keep-temp

Running autometa-taxonomy-lca entrypoint

autometa-taxonomy-lca --blast /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/78Mbp/78Mbp_gtdb_blastP.tsv --dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/gtdb_to_diamond --dbtype gtdb --sseqid2taxid-output 78Mbp_sseqid2taxid_test.tsv --lca-error-taxids 78Mbp_lcaErrorTaxids_test.tsv --verbose --lca-output 78Mbp_LCAout_test.tsv

Running autometa-taxonomy-majority-vote entrypoint

autometa-taxonomy-majority-vote --lca 78Mbp_LCAout_test.tsv --output 78Mbp_gtdb_majority_vote.tsv --dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/gtdb_to_diamond --verbose --dbtype gtdb

Running autometa-taxonomy entrypoint

autometa-taxonomy --votes 78Mbp_gtdb_majority_vote.tsv --assembly /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna --output testTaxonomy --split-rank-and-write superkingdom --dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/gtdb_to_diamond --dbtype gtdb

Running autometa-summary entrypoint

autometa-binning-summary --binning-main /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.main.tsv --markers /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.markers.tsv --dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/gtdb_to_diamond --dbtype gtdb --output-stats binningSummartStats.tsv --output-taxonomy binningTaxa.tsv --output-metabins metaBins --metagenome /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna

Test NCBI

Running autometa-taxonomy-lca entrypoint

autometa-taxonomy-lca --blast /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.blastp.tsv --dbdir /media/bigdrive1/Databases/autometa_databases/ --lca-output lcaOut_ncbi.tsv --sseqid2taxid-output sseqid2taxid_ncbiTest.tsv --lca-error-taxids lcaError_ncbi.tsv --verbose

Running autometa-taxonomy-majority-vote entrypoint

autometa-taxonomy-majority-vote --lca lcaOut_ncbi.tsv --output ncbi_majority_vote.tsv --dbdir /media/bigdrive1/Databases/autometa_databases/ --verbose --dbtype ncbi

Running autometa-taxonomy entrypoint

autometa-taxonomy --votes ncbi_majority_vote.tsv --assembly /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna --output testNCBI --split-rank-and-write superkingdom --dbdir /media/bigdrive1/Databases/autometa_databases/

Running autometa-summary entrypoint

autometa-binning-summary --binning-main /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.main.tsv --markers /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.markers.tsv --metagenome /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna --dbdir  /media/bigdrive1/Databases/autometa_databases/ --output-stats binningSummartStats2.tsv --output-taxonomy binningTaxa2.tsv --output-metabins metaBins2

TODO:

PR checklist

github-actions[bot] commented 2 years ago

nf-core lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit 7f86c81

+| ✅  62 tests passed       |+
#| ❔  34 tests were ignored |#
!| ❗   9 tests had warnings |!
### :heavy_exclamation_mark: Test warnings: * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README did not have a Nextflow minimum version badge. * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README did not have a Nextflow minimum version mentioned in Quick Start section. * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema `$id` should be `https://raw.githubusercontent.com/autometa/master/nextflow_schema.json` Found `https://raw.githubusercontent.com/autometa/main/nextflow_schema.json` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `plaintext_email` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `custom_config_version` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `custom_config_base` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `hostnames` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `show_hidden_params` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `singularity_pull_docker_container` ### :grey_question: Tests ignored: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/branch.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/ci.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/awstest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/awsfulltest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `assets/nf-core-autometa_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/usage.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/output.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-autometa_logo.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-autometa_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-autometa_logo_dark.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/feature_request.md` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - nextflow_config * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/CONTRIBUTING.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/PULL_REQUEST_TEMPLATE.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `.github/workflows/branch.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/workflows/linting_comment.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/workflows/linting.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `assets/email_template.html` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `assets/email_template.txt` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `assets/nf-core-autometa_logo_light.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `docs/images/nf-core-autometa_logo_light.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `docs/images/nf-core-autometa_logo_dark.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `docs/README.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `lib/NfcoreTemplate.groovy` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.gitignore` or `foo` * [actions_ci](https://nf-co.re/tools-docs/lint_tests/actions_ci.html) - '.github/workflows/ci.yml' not found * [actions_awstest](https://nf-co.re/tools-docs/lint_tests/actions_awstest.html) - 'awstest.yml' workflow not found: `/home/runner/work/Autometa/Autometa/.github/workflows/awstest.yml` * [template_strings](https://nf-co.re/tools-docs/lint_tests/template_strings.html) - template_strings ### :white_check_mark: Tests passed: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitattributes` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitignore` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.markdownlint.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CHANGELOG.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CITATIONS.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow_schema.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/.dockstore.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/CONTRIBUTING.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/PULL_REQUEST_TEMPLATE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting_comment.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.html` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/sendmail_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/modules.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test_full.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/nfcore_external_java_deps.jar` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreSchema.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreTemplate.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/Utils.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowMain.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `main.nf` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/multiqc_config.yaml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/base.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/igenomes.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowAutometa.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `modules.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `Singularity` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `parameters.settings.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `bin/markdown_to_html.r` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `conf/aws.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.travis.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.gitattributes` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.markdownlint.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `CODE_OF_CONDUCT.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/.dockstore.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/config.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/sendmail_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/nfcore_external_java_deps.jar` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/NfcoreSchema.groovy` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/multiqc_config.yaml` matches the template * [pipeline_name_conventions](https://nf-co.re/tools-docs/lint_tests/pipeline_name_conventions.html) - Name adheres to nf-core convention * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema lint passed * [schema_params](https://nf-co.re/tools-docs/lint_tests/schema_params.html) - Schema matched params returned from nextflow config * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: docker_autometa.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: pytest_codecov.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: docker_get_genomes_for_mock.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting_comment.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: docker_mock_data_reporter.yml * [merge_markers](https://nf-co.re/tools-docs/lint_tests/merge_markers.html) - No merge markers found in pipeline files * [modules_json](https://nf-co.re/tools-docs/lint_tests/modules_json.html) - Only installed modules found in `modules.json` ### Run details * nf-core/tools version 2.2 * Run at `2022-11-12 16:49:23`
Sidduppal commented 2 years ago

One thing I noticed while testing the scripts: The NCBI class is instantiated with the variable dbpath while the GTDB class is instantiated with dbdir. Nothing is breaking, but something to keep in mind as we go ahead and maybe make them both consistent.

codecov[bot] commented 2 years ago

Codecov Report

Base: 27.40% // Head: 27.35% // Decreases project coverage by -0.05% :warning:

Coverage data is based on head (54c168c) compared to base (86b9550). Patch coverage: 30.30% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #284 +/- ## ========================================== - Coverage 27.40% 27.35% -0.06% ========================================== Files 48 50 +2 Lines 5469 5733 +264 ========================================== + Hits 1499 1568 +69 - Misses 3970 4165 +195 ``` | Flag | Coverage Δ | | |---|---|---| | unittests | `27.35% <30.30%> (-0.06%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/KwanLab/Autometa/pull/284?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab) | Coverage Δ | | |---|---|---| | [autometa/binning/large\_data\_mode.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvYmlubmluZy9sYXJnZV9kYXRhX21vZGUucHk=) | `0.00% <0.00%> (ø)` | | | [autometa/config/databases.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvY29uZmlnL2RhdGFiYXNlcy5weQ==) | `0.00% <0.00%> (ø)` | | | [autometa/config/utilities.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvY29uZmlnL3V0aWxpdGllcy5weQ==) | `25.00% <ø> (ø)` | | | [autometa/taxonomy/majority\_vote.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvdGF4b25vbXkvbWFqb3JpdHlfdm90ZS5weQ==) | `13.53% <11.53%> (+0.83%)` | :arrow_up: | | [autometa/binning/summary.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvYmlubmluZy9zdW1tYXJ5LnB5) | `41.48% <18.18%> (-0.38%)` | :arrow_down: | | [autometa/taxonomy/gtdb.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvdGF4b25vbXkvZ3RkYi5weQ==) | `21.01% <21.01%> (ø)` | | | [autometa/taxonomy/lca.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvdGF4b25vbXkvbGNhLnB5) | `9.37% <22.22%> (+0.57%)` | :arrow_up: | | [autometa/common/utilities.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvY29tbW9uL3V0aWxpdGllcy5weQ==) | `23.35% <33.33%> (+0.65%)` | :arrow_up: | | [autometa/taxonomy/ncbi.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvdGF4b25vbXkvbmNiaS5weQ==) | `49.51% <35.00%> (-6.92%)` | :arrow_down: | | [autometa/taxonomy/database.py](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab#diff-YXV0b21ldGEvdGF4b25vbXkvZGF0YWJhc2UucHk=) | `51.85% <51.85%> (ø)` | | | ... and [9 more](https://codecov.io/gh/KwanLab/Autometa/pull/284/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=KwanLab)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

evanroyrees commented 1 year ago

Looking at submitting a bioconda recipe for gtdb_to_taxdump and in the README it suggests using TaxonKit for retrieving stable taxids across GTDB releases (rather than arbitrarily assigning them as noted). Below is a screenshot from the gtdb_to_taxdump summary section:

Screen Shot 2022-08-09 at 09 34 30

Next Steps Towards Stability

TaxonKit is available via bioconda so would be easy to include in autometa-env.yml and documentation for generating these taxdump files from the GTDB looks straight-forward as outlined in the GTDB-taxdump repo

TaxonKit conda page: https://anaconda.org/bioconda/taxonkit GTDB-taxdump page (using TaxonKit): https://github.com/shenwei356/gtdb-taxdump

NOTE: TaxonKit v0.12 or greater should be used. (ref)

The taxdump files may also be downloaded directly from the releases page: https://github.com/shenwei356/gtdb-taxdump/releases which reduces the compute requirements.

That being said, if we would like to generate the GTDB taxdump files using this, the steps are outlined here: https://github.com/shenwei356/gtdb-taxdump#steps

Looking in to the future, this is probably the more appropriate route as taxids may be relied up across future GTDB releases.

P.S there is also a related project linked (https://github.com/shenwei356/ictv-taxdump) that generates an NCBI taxdump for viruses and may be useful in the future (@kaw97, is this of any interest to you?)

chasemc commented 1 year ago

Not sure if this affects what y'all are doing: https://github.com/shenwei356/gtdb-taxdump/issues/2

shenwei356 commented 1 year ago

Not sure if this affects what y'all are doing: shenwei356/gtdb-taxdump#2

Only merged.dmp and delnodes.dmp are affected. If you just need to use the taxonomy data of the current version, say R207, don't worry.

chasemc commented 1 year ago

Thanks @shenwei356

Sidduppal commented 1 year ago

@WiscEvan The scripts have been updated to use taxonkit's already generated files. I decided to use merged.dmp and delnodes.dmp as they were provided with the taxdump. Although if you think the issue mentioned above by @chasemc would skew the results a LOT we can remove their usage. I'm mentioning the commands for testing:

Setting up the database

autometa-config     \
--section databases     \
--option gtdb     \
--value /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonKit/gtdb-taxdump/R207/test
autometa-update-databases --update-gtdb

Run LCA

autometa-taxonomy-lca \
--blast 78mbp_metagenome.blastp.gtdb.tsv --dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonKit/gtdb-taxdump/R207/test \
--dbtype gtdb \
--sseqid2taxid-output 78Mbp_sseqid2taxid_test.tsv \
--lca-error-taxids 78Mbp_lcaErrorTaxids_test.tsv \
--verbose \
--lca-output 78Mbp_LCAout_test.tsv

Run Majority vote

autometa-taxonomy-majority-vote \
--lca 78Mbp_LCAout_test.tsv \
--output 78Mbp_gtdb_majority_vote.tsv \
--dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonKit/gtdb-taxdump/R207/test \ 
--verbose \
--dbtype gtdb

Run taxonomy

autometa-taxonomy \
--votes 78Mbp_gtdb_majority_vote.tsv \
--assembly /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna \
--output testTaxonomy \
--split-rank-and-write superkingdom \
--dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonKit/gtdb-taxdump/R207/test \
--dbtype gtdb

Run summary

autometa-binning-summary \
--binning-main 78_binningMain_gtdb.tsv \
--markers /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.markers.tsv \
--dbdir /media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonKit/gtdb-taxdump/R207/test \
--dbtype gtdb \
--output-stats binningSummartStats.tsv --output-taxonomy binningTaxa.tsv \
--output-metabins metaBins \
--metagenome /media/bigdrive1/sidd/nextflow_trial/autometa_runs/78mbp_manual/interim/78mbp_metagenome.filtered.fna

By using this we don't need to add any additional dependency as well. Let me know what you think.

evanroyrees commented 1 year ago

The tests and CI/CD still need to be resolved, but I think we're almost there.

Sidduppal commented 1 year ago

Addressed all the comments.

🐛 🛠️ I'm not sure how you encountered the error below. Sam and I implemented a set -x / { set +x; } 2>/dev/null routine before and after running each module (https://github.com/KwanLab/Autometa/blob/gtdb_to_autometa/workflows/autometa_flagged.sh) that should allow easier inspection of a user's parameter configurations without them sending the entire submit file.

Still getting the following error when running autometa-large-data-mode-gtdb.sh. I'm getting the error during the large-data-mode binning (lines).

The set -x / { set +x; } 2>/dev/null routine has only been set for autometa_flagged.sh and no other script. I'm running the large-data-mode workflow which does not have the above stated routine. The bug seems to be with the large-data-mode implementation as all the other workflows are running fine.

[10/16/2022 04:41:27 PM DEBUG] autometa.common.kmers: umap: 10 data points and 10 dimensions
[10/16/2022 04:41:27 PM DEBUG] autometa.common.kmers: Performing embedding with umap (seed 42)
/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/umap/umap_.py:2344: UserWarning: n_neighbors is larger than the dataset size; truncating to X.shape[0] - 1
  warn(
Traceback (most recent call last):
  File "/home/sidd/miniconda3/envs/autometa_aims/bin/autometa-binning-ldm", line 33, in <module>
    sys.exit(load_entry_point('Autometa==2.1.0', 'console_scripts', 'autometa-binning-ldm')())
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/Autometa-2.1.0-py3.9.egg/autometa/binning/large_data_mode.py", line 831, in main
    main_out = cluster_by_taxon_partitioning(
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/Autometa-2.1.0-py3.9.egg/autometa/binning/large_data_mode.py", line 441, in cluster_by_taxon_partitioning
    rank_embedding = get_kmer_embedding(
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/Autometa-2.1.0-py3.9.egg/autometa/binning/large_data_mode.py", line 112, in get_kmer_embedding
    embedding.to_csv(cache_fpath, sep="\t", index=True, header=True)
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/pandas/core/generic.py", line 3551, in to_csv
    return DataFrameRenderer(formatter).to_csv(
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1180, in to_csv
    csv_formatter.save()
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 241, in save
    with get_handle(
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/pandas/io/common.py", line 694, in get_handle
    check_parent_directory(str(handle))
  File "/home/sidd/miniconda3/envs/autometa_aims/lib/python3.9/site-packages/pandas/io/common.py", line 568, in check_parent_directory
    raise OSError(rf"Cannot save file into a non-existent directory: '{parent}'")
OSError: Cannot save file into a non-existent directory: '/media/bigdrive1/sidd/autometa_aim1_1/data/external/gtdbData_r207v2/test1/taxonkit/78mbp_metagenome_Autometa_Output3/78mbp_metagenome_bacteria_cache/species'