galaxyproject / iwc

Galaxy Workflows maintained by the Intergalactic Workflow Commission
https://dockstore.org/organizations/iwc
28 stars 59 forks source link

VGP9 - Bug fixed and changing tests #506

Closed Delphine-L closed 4 weeks ago

Delphine-L commented 4 weeks ago

The bug was a missing regex that caused the file with the list of contaminants to get erased and the contaminants were not removed.

I fixed the problem and updated the tests to check for it.

I also added a tool report listing the contaminants a mitochondrial scaffolds

github-actions[bot] commented 4 weeks ago

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests *
❌ Assembly-decontamination-VGP9.ga_0
**Execution Problem:** * ``` Failed to run workflow, at least one job is in [paused] state. ``` #### Workflow invocation details * Invocation Messages *
Steps - **Step 1: Scaffolded assembly (fasta)**: * step_state: scheduled - **Step 2: Database for Kraken2**: * step_state: scheduled - **Step 11: Filter1**: * step_state: scheduled *
Jobs - **Job 1:** * Job state is new **Traceback:** * ```console ``` **Job Parameters:** * | Job parameter | Parameter value | | ------------- | --------------- | | \_\_input\_ext | ` "tabular" ` | | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` | | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` | | cond | ` "c1!='U'" ` | | dbkey | ` "?" ` | | header\_lines | ` "0" ` |
 - **Step 12: remove > + lowercase**:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is new

        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "input" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | adv\_opts | ` {"__current_case__": 0, "adv_opts_selector": "basic"} ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | code | ` "s/>//g\ns/[A-Z]/\\L&/g" ` |
             | dbkey | ` "?" ` |

      </details>

 - **Step 13: to lowercase**:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is new

        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "input" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | adv\_opts | ` {"__current_case__": 0, "adv_opts_selector": "basic"} ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | code | ` "s/[A-Z]/\\L&/g" ` |
             | dbkey | ` "?" ` |

      </details>

 - **Step 14: isolate scaffolds names **:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is new

        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "input" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | dbkey | ` "?" ` |
             | find\_and\_replace | ` [{"__index__": 0, "caseinsensitive": false, "find_pattern": " kraken:taxid\\|[0-9]+", "global": true, "is_regex": true, "replace_pattern": null, "searchwhere": {"__current_case__": 0, "searchwhere_select": "line"}, "skip_first_line": false, "wholewords": true}] ` |

      </details>

 - **Step 15: concatenate scaffold lists**:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is new

        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "input" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | dbkey | ` "?" ` |
             | queries | ` [] ` |

      </details>

 - **Step 16: removing scaffolds **:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is new

        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "input" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | dbkey | ` "?" ` |
             | mode\_condition | ` {"__current_case__": 0, "discover_paths": false, "homopolymer_compress": null, "output_condition": {"__current_case__": 1, "line_length": null, "out_format": "fasta.gz"}, "selector": "manipulation", "sort": "", "swiss_army_knife": null} ` |
             | target\_condition | ` {"__current_case__": 1, "exclude_bed": {"values": [{"id": 17, "src": "hda"}]}, "include_bed": null, "target_option": "true", "target_sequence": ""} ` |

      </details>

 - **Step 3: lower to uppercase**:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is ok

        **Command Line:**

         * ```console
           sed --sandbox -r -f '/tmp/tmpaszsmvng/job_working_directory/000/2/configs/tmpjon4h1xx' '/tmp/tmpaszsmvng/files/4/e/7/dataset_4e73e93e-be8b-4535-a2b0-f3682b24f87e.dat' > '/tmp/tmpaszsmvng/job_working_directory/000/2/outputs/dataset_1ab622f2-ba1c-4959-928a-04ca3ce0b182.dat'
           ```
        **Exit Code:**

         * ```console
           0
           ```
        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "input" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | adv\_opts | ` {"__current_case__": 0, "adv_opts_selector": "basic"} ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | code | ` "s/.*/\\U&/" ` |
             | dbkey | ` "?" ` |

      </details>

 - **Step 4: soft-masking **:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is ok

        **Command Line:**

         * ```console
           dustmasker -in '/tmp/tmpaszsmvng/files/1/a/b/dataset_1ab622f2-ba1c-4959-928a-04ca3ce0b182.dat' -infmt fasta -out '/tmp/tmpaszsmvng/job_working_directory/000/3/outputs/dataset_eb6cb399-9064-492f-a0a8-12191b1d7697.dat' -window 64 -level 40 -linker 1 -outfmt fasta
           ```
        **Exit Code:**

         * ```console
           0
           ```
        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "fasta" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | db\_opts | ` {"__current_case__": 2, "database": "", "db_opts_selector": "file", "histdb": "", "subject": {"values": [{"id": 2, "src": "hda"}]}} ` |
             | dbkey | ` "?" ` |
             | level | ` "40" ` |
             | linker | ` "1" ` |
             | outformat | ` "fasta" ` |
             | window | ` "64" ` |

      </details>

 - **Step 5: hard-masking**:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is ok

        **Command Line:**

         * ```console
           sed --sandbox -r -f '/tmp/tmpaszsmvng/job_working_directory/000/4/configs/tmpenino8g0' '/tmp/tmpaszsmvng/files/e/b/6/dataset_eb6cb399-9064-492f-a0a8-12191b1d7697.dat' > '/tmp/tmpaszsmvng/job_working_directory/000/4/outputs/dataset_bd99e1b6-9ab2-4c39-b551-668487d5558e.dat'
           ```
        **Exit Code:**

         * ```console
           0
           ```
        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "input" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | adv\_opts | ` {"__current_case__": 0, "adv_opts_selector": "basic"} ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | code | ` "s/[a-z]/N/g" ` |
             | dbkey | ` "?" ` |

      </details>

 - **Step 6: ID non-target contaminants**:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is error

        **Command Line:**

         * ```console
           kraken2 --threads ${GALAXY_SLOTS:-1} --db '/cvmfs/data.galaxyproject.org/managed/kraken2_databases/k2_pluspf_20210517'    '/tmp/tmpaszsmvng/files/b/d/9/dataset_bd99e1b6-9ab2-4c39-b551-668487d5558e.dat'  --classified-out '/tmp/tmpaszsmvng/job_working_directory/000/5/outputs/dataset_a312c999-2116-4d0d-9f16-aaa6b95e7095.dat' --unclassified-out '/tmp/tmpaszsmvng/job_working_directory/000/5/outputs/dataset_f7351d71-cae7-48b1-9974-9b1a49e6ce57.dat'  --confidence '0.3' --minimum-base-quality '0' --minimum-hit-groups '2'  --use-names   > '/tmp/tmpaszsmvng/job_working_directory/000/5/outputs/dataset_c677dc61-37b9-4f0b-9d9b-d1f16fcdb68f.dat'
           ```
        **Exit Code:**

         * ```console
           71
           ```
        **Standard Error:**

         * ```console
           Loading database information...Failed attempt to allocate 53753200640bytes;
           you may not have enough free memory to load this database.
           If your computer has enough RAM, perhaps reducing memory usage from
           other programs could help you load this database?
           classify: unable to allocate hash table memory

           ```
        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "fasta" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | confidence | ` "0.3" ` |
             | dbkey | ` "?" ` |
             | kraken2\_database | ` "k2_pluspf_20210517" ` |
             | min\_base\_quality | ` "0" ` |
             | minimum\_hit\_groups | ` "2" ` |
             | quick | ` false ` |
             | report | ` {"create_report": false, "report_minimizer_data": false, "report_zero_counts": false, "use_mpa_style": false} ` |
             | single\_paired | ` {"__current_case__": 2, "input_sequences": {"values": [{"id": 4, "src": "hda"}]}, "single_paired_selector": "no"} ` |
             | split\_reads | ` true ` |
             | use\_names | ` true ` |

      </details>

 - **Step 7: blast mitochondria DB**:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is running

        **Command Line:**

         * ```console
           blastn  -query '/tmp/tmpaszsmvng/files/b/d/9/dataset_bd99e1b6-9ab2-4c39-b551-668487d5558e.dat'   -db '"/cvmfs/data.galaxyproject.org/byhand/refseq/mitochondrion/genomic/2022-03-10/mitochondrion"'  -task 'blastn' -evalue '0.001' -out '/tmp/tmpaszsmvng/job_working_directory/000/6/outputs/dataset_612f8bf2-ce2d-4143-a820-e1c99cd39c77.dat' -outfmt '6 qseqid sseqid length qstart qend evalue qlen qcovs qcovhsp'  -num_threads "${GALAXY_SLOTS:-8}"
           ```
        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "fasta" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | adv\_opts | ` {"__current_case__": 0, "adv_opts_selector": "basic"} ` |
             | blast\_type | ` "blastn" ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | db\_opts | ` {"__current_case__": 0, "database": ["refseq_mitochondrion"], "db_opts_selector": "db", "histdb": "", "subject": ""} ` |
             | dbkey | ` "?" ` |
             | evalue\_cutoff | ` "0.001" ` |
             | output | ` {"__current_case__": 2, "ext_cols": ["qlen"], "ids_cols": null, "misc_cols": ["qcovs", "qcovhsp"], "out_format": "cols", "std_cols": ["qseqid", "sseqid", "length", "qstart", "qend", "evalue"], "tax_cols": null} ` |

      </details>

 - **Step 8: Cut1**:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is paused

        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "tabular" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | columnList | ` "c1,c2,c3" ` |
             | dbkey | ` "?" ` |
             | delimiter | ` "T" ` |

      </details>

 - **Step 9: contaminant scaffolds**:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is paused

        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "input" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | case\_sensitive | ` "-i" ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | color | ` "NOCOLOR" ` |
             | dbkey | ` "?" ` |
             | invert | ` "" ` |
             | lines\_after | ` "0" ` |
             | lines\_before | ` "0" ` |
             | regex\_type | ` "-G" ` |
             | url\_paste | ` "scaffold" ` |

      </details>

 - **Step 10: parsing blast output**:

    * step_state: scheduled

    * <details><summary>Jobs</summary>

      - **Job 1:**

        * Job state is new

        **Traceback:**

         * ```console

           ```
        **Job Parameters:**

         *   | Job parameter | Parameter value |
             | ------------- | --------------- |
             | \_\_input\_ext | ` "input" ` |
             | \_\_workflow\_invocation\_uuid\_\_ | ` "ec142db05b0c11ef87a389aea72632df" ` |
             | chromInfo | ` "/tmp/tmpaszsmvng/galaxy-dev/tool-data/shared/ucsc/chrom/?.len" ` |
             | dbkey | ` "?" ` |

      </details>
  </details>
  • Other invocation details - **error_message** * Failed to run workflow, at least one job is in [paused] state. - **history_id** * e54bd8701e8a9176 - **history_state** * paused - **invocation_id** * e54bd8701e8a9176 - **invocation_state** * scheduled - **workflow_id** * e54bd8701e8a9176