KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

:art::bug::snake: Convert 'unclassified' taxids of 0 to root taxid of 1 #254

Closed evanroyrees closed 2 years ago

evanroyrees commented 2 years ago

:art::snake: Some taxon-profilers assign unclassified contigs to a taxid of 0 where others assign root taxid of 1.

The NCBI class method of converting the taxid datatype expects a positive integer, so we are converting the unclassified taxid of 0 to the root taxid of 1. All input predictions will be converted in this same way. The contigs with their converted taxids as well as the number of 'unclassified' taxids are now emitted in the logs during conversion.

github-actions[bot] commented 2 years ago

nf-core lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit 7d942e3

+| ✅  62 tests passed       |+
#| ❔  34 tests were ignored |#
!| ❗   9 tests had warnings |!
### :heavy_exclamation_mark: Test warnings: * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README did not have a Nextflow minimum version badge. * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README did not have a Nextflow minimum version mentioned in Quick Start section. * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema `$id` should be `https://raw.githubusercontent.com/autometa/master/nextflow_schema.json` Found `https://raw.githubusercontent.com/autometa/main/nextflow_schema.json` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `plaintext_email` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `custom_config_version` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `custom_config_base` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `hostnames` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `show_hidden_params` * [schema_description](https://nf-co.re/tools-docs/lint_tests/schema_description.html) - No description provided in schema for parameter: `singularity_pull_docker_container` ### :grey_question: Tests ignored: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/branch.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/ci.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/awstest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/awsfulltest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `assets/nf-core-autometa_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/usage.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/output.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-autometa_logo.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-autometa_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-autometa_logo_dark.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/feature_request.md` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - nextflow_config * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/CONTRIBUTING.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/PULL_REQUEST_TEMPLATE.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `.github/workflows/branch.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/workflows/linting_comment.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/workflows/linting.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `assets/email_template.html` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `assets/email_template.txt` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `assets/nf-core-autometa_logo_light.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `docs/images/nf-core-autometa_logo_light.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `docs/images/nf-core-autometa_logo_dark.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `docs/README.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `lib/NfcoreTemplate.groovy` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.gitignore` or `foo` * [actions_ci](https://nf-co.re/tools-docs/lint_tests/actions_ci.html) - '.github/workflows/ci.yml' not found * [actions_awstest](https://nf-co.re/tools-docs/lint_tests/actions_awstest.html) - 'awstest.yml' workflow not found: `/home/runner/work/Autometa/Autometa/.github/workflows/awstest.yml` * [template_strings](https://nf-co.re/tools-docs/lint_tests/template_strings.html) - template_strings ### :white_check_mark: Tests passed: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitattributes` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitignore` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.markdownlint.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CHANGELOG.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CITATIONS.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow_schema.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/.dockstore.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/CONTRIBUTING.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/PULL_REQUEST_TEMPLATE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting_comment.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.html` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/sendmail_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/modules.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test_full.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/nfcore_external_java_deps.jar` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreSchema.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreTemplate.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/Utils.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowMain.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `main.nf` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/multiqc_config.yaml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/base.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/igenomes.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowAutometa.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `modules.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `Singularity` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `parameters.settings.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `bin/markdown_to_html.r` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `conf/aws.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.travis.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.gitattributes` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.markdownlint.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `CODE_OF_CONDUCT.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/.dockstore.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/config.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/sendmail_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/nfcore_external_java_deps.jar` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/NfcoreSchema.groovy` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/multiqc_config.yaml` matches the template * [pipeline_name_conventions](https://nf-co.re/tools-docs/lint_tests/pipeline_name_conventions.html) - Name adheres to nf-core convention * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema lint passed * [schema_params](https://nf-co.re/tools-docs/lint_tests/schema_params.html) - Schema matched params returned from nextflow config * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: pytest_codecov.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: docker_mock_data_reporter.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting_comment.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: docker_autometa.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: docker_get_genomes_for_mock.yml * [merge_markers](https://nf-co.re/tools-docs/lint_tests/merge_markers.html) - No merge markers found in pipeline files * [modules_json](https://nf-co.re/tools-docs/lint_tests/modules_json.html) - Only installed modules found in `modules.json` ### Run details * nf-core/tools version 2.2 * Run at `2022-04-05 15:53:30`
codecov[bot] commented 2 years ago

Codecov Report

Merging #254 (7d942e3) into dev (025b099) will decrease coverage by 0.01%. The diff coverage is 0.00%.

@@            Coverage Diff             @@
##              dev     #254      +/-   ##
==========================================
- Coverage   27.30%   27.28%   -0.02%     
==========================================
  Files          45       45              
  Lines        5336     5339       +3     
==========================================
  Hits         1457     1457              
- Misses       3879     3882       +3     
Flag Coverage Δ
unittests 27.28% <0.00%> (-0.02%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
autometa/validation/benchmark.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5442af2...7d942e3. Read the comment docs.

chasemc commented 2 years ago

May be cleaner: 'pred_df.taxid = pred_df.taxid.clip(lower=1)'

could also be done inplace

evanroyrees commented 2 years ago

May be cleaner: pred_df.taxid = pred_df.taxid.clip(lower=1)

could also be done inplace

Nice, I was unaware of this method.

I'm not sure if this is preferred or more understandable 🤷

Or if inplace modification vs. reassignment is preferred.

pred_df.taxid.clip(lower=1, inplace=True) feels nice and clean though 😄

Sidduppal commented 2 years ago

Both ways look good to me. You can chose and I can approve it.

evanroyrees commented 2 years ago

The only concern I have is whether this will work if any of the taxids are strings rather than a float or int?

chasemc commented 2 years ago

The only concern I have is whether this will work if any of the taxids are strings rather than a float or int?

it would error

evanroyrees commented 2 years ago

Ah, yeah.

df.taxid.clip(lower=1)
TypeError: '>=' not supported between instances of 'str' and 'int'

The current mapping method will leave any 'str' types to be converted with ncbi.convert_taxid_dtype(...)

evanroyrees commented 2 years ago

@Sidduppal, then I think this is ready for review. Albeit maybe not as clean, it leaves the exceptions for ncbi.convert_taxid_dtype