MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

Error collecting output for parameter #274

Closed leetamm2 closed 1 year ago

leetamm2 commented 1 year ago

Hello,

I've ran into an error: "Error collecting output for parameter" while running tinyrna on my mac

Basic description of the error

I have two small RNA-seq sequencing files that I would like to process using tinyrna. After completing the samples.csv, features.csv and updating paths.yml, I ran the run_config.yml as instructed. My first fastq file was processed but the second one failed to go pass the fastp step. Maybe it's an issue with my files?

Environment

Operating System: Mac OS Monterey tinyrna version: v1.2.1

Reproducible example

tiny run --config run_config.yml

My fasta files are too big to attach but I can email it.

Input configuration

######----------------------------- tinyRNA Configuration -----------------------------######
#
# In this file you can specify your configuration preferences for the workflow and
# each workflow step.
#
# If you want to use DEFAULT settings for the workflow, all you need to do is provide the path
# to your Samples Sheet and Features Sheet in your Paths File, then make sure that the
# 'paths_config' setting below points to your Paths File.
#
# We suggest that you also:
#   1. Add a username to identify the person performing runs, if desired for record keeping
#   2. Add a run directory name in your Paths File. If not provided, "run_directory" is used
#   3. Add a run name to label your run directory and run-specific summary reports.
#      If not provided, user_tinyrna will be used.
#
# This file will be further processed at run time to generate the appropriate pipeline
# settings for each workflow step. A copy of this processed configuration will be stored
# in your run directory.
#
######-------------------------------------------------------------------------------######

user: NewUser
run_date: ~
run_time: ~
paths_config: ./paths.yml

##-- The label for final outputs --##
##-- If none provided, the default of user_tinyrna will be used --##
run_name: tinyrna

##-- Number of threads to use when a step supports multi-threading --##
##-- For best performance, this should be equal to your computer's processor core count --##
threads: 4

##-- Control the amount of information printed to terminal: debug, normal, quiet --##
verbosity: normal

##-- If True: process fastp, tiny-collapse, and bowtie in parallel per-library --##
run_parallel: true

##-- (EXPERIMENTAL) If True: execute the pipeline using native cwltool Python --##
run_native: false

######------------------------- BOWTIE INDEX BUILD OPTIONS --------------------------######
#
# If you do not already have bowtie indexes, they can be built for you by setting
# run_bowtie_build (above) to true and adding your reference genome file(s) to your
# paths_config file.
#
# We have specified default parameters for small RNA data based on our own "best practices".
# You can change the parameters here.
#
######-------------------------------------------------------------------------------######

##-- SA is sampled every 2^offRate BWT chars (default: 5)
offrate: ~

##-- Convert Ns in reference to As --##
ntoa: false

##-- Don't build .3/.4.ebwt (packed reference) portion --##
noref: false

##-- Number of chars consumed in initial lookup (default: 10) --##
ftabchars: ~

######---------------------TRIMMING AND QUALITY FILTER OPTIONS ----------------------######
#
# We use the program fastp to perform: adapter trimming (req), quality filtering (on),
# and QC analysis for an output QC report. See https://github.com/OpenGene/fastp for more
# information on the fastp tool. We have limited the options available to those appropriate
# for small RNA sequencing data. If you require an addition option, create an issue on the
# pipeline github: https://github.com/MontgomeryLab/tinyrna
#
# We have specified default parameters for small RNA data based on our own "best practices".
# You can change the parameters here.
#
######-------------------------------------------------------------------------------######

##-- Adapter sequence to trim --##
adapter_sequence: 'auto'

##-- Minumum & maximum accepted lengths after trimming --##
length_required: 15
length_limit: 35

##-- Minimum average score for a read to pass quality filter --##
average_qual: 25

##-- Minimum phred score for a base to pass quality filter --##
qualified_quality_phred: 20

##-- Minimum % of bases that can be below minimum phred score (above) --##
unqualified_percent_limit: 10

##-- Minimum allowed number of bases --##
n_base_limit: 1

##-- Compression level for gzip output --##
compression: 4

###-- Unused optional inputs: Remove '#' in front to use --###
##-- Trim poly x tails of a given length --##
# trim_poly_x: false
# poly_x_min_len: 0

##-- Trim n bases from the front/tail of a read --##
# trim_front1: 0
# trim_tail1: 0

##-- Is the data phred 64? --##
# fp_phred64: False

##-- Turn on overrepresentation sampling analysis --##
# overrepresentation_sampling: 0
# overrepresentation_analysis: false

##-- If true: don't overwrite the files --##
# dont_overwrite: false

##-- If true: disable these options --##
# disable_quality_filtering: false
# disable_length_filtering: false
# disable_adapter_trimming: false

######--------------------------- READ COLLAPSER OPTIONS ----------------------------######
#
# We use a custom Python utility for collapsing duplicate reads.
# We recommend using the default (keep all reads, or threshold: 0).
# Sequences <= threshold will not be included in downstream steps.
# Trimming takes place prior to counting/collapsing.
#
# We have specified default parameters for small RNA data based on our own "best practices".
# You can change the parameters here.
#
######-------------------------------------------------------------------------------######

##-- Trim the specified number of bases from the 5' end of each sequence --##
5p_trim: 0

##-- Trim the specified number of bases from the 3' end of each sequence --##
3p_trim: 0

##-- Sequences with count <= threshold will be placed in a separate low_counts fasta --##
threshold: 0

##-- If True: outputs will be gzip compressed --##
compress: False

######-------------------------- BOWTIE ALIGNMENT OPTIONS ---------------------------######
#
# We use bowtie for read alignment to a genome.
#
# We have specified default parameters for small RNA data based on our own "best practices".
# You can change the parameters here.
#
######-------------------------------------------------------------------------------######

##-- Report end-to-end hits w/ <=v mismatches; ignore qualities --##
end_to_end: 0

##-- Report all alignments per read (much slower than low -k) --##
all_aln: True

##-- Seed for random number generator --##
seed: 0

##-- Suppress SAM records for unaligned reads --##
no_unal: True

##-- Use shared mem for index; many bowtie's can share --##
##-- Note: this requires further configuration of your OS --##
##-- http://bowtie-bio.sourceforge.net/manual.shtml#bowtie-options-shmem --##
shared_memory: False

###-- Unused option inputs: Remove '#' in front to use --###
##-- Hits are guaranteed best stratum, sorted; ties broken by quality --##
#best: False

##-- Hits in sub-optimal strata aren't reported (requires best, ^^^^) --##
#strata: False

##-- Max mismatches in seed (can be 0-3, default: -n 2) --##
#seedmms: 2

##-- Seed length for seedmms (default: 28) --##
#seedlen: 28

##-- Do not align to reverse-compliment reference --##
# norc: False

##-- Do not align to forward reference --##
# nofw: False

##-- Input quals are Phred+64 (same as --solexa1.3-quals) --##
# bt_phred64: False

##-- Report up to <int> good alignments per read (default: 1) --##
# k_aln

##-- Number of bases to trim from 5' or 3' end of reads --##
# trim5: 0
# trim3: 0

##-- Input quals are from GA Pipeline ver. < 1.3 --##
# solexa: false

##-- Input quals are from GA Pipeline ver. >= 1.3 --##
# solexa13: false

######--------------------------- FEATURE COUNTER OPTIONS ---------------------------######
#
# We use a custom Python utility that utilizes HTSeq's Genomic Array of Sets and GFF reader
# to count small RNA reads. Selection rules are defined in your Features Sheet.
#
######-------------------------------------------------------------------------------######

##-- If True: show all parsed features in the counts csv, regardless of count/identity --##
counter_all_features: False

##-- If True: counts will be normalized by genomic hits AND selected feature count --##
##-- If False: counts will only be normalized by genomic hits --##
counter_normalize_by_hits: True

##-- If True: a decollapsed copy of each SAM file will be produced (useful for IGV) --##
counter_decollapse: False

##-- Select the StepVector implementation that is used. Options: HTSeq or Cython --##
counter_stepvector: 'Cython'

##-- If True: produce diagnostic logs to indicate what was eliminated and why --##
counter_diags: False

######--------------------------- DIFFERENTIAL EXPRESSION ---------------------------######
#
# Differential expression analysis is performed using the DESeq2 R library.
#
######-------------------------------------------------------------------------------######

##-- If True: produce a principal component analysis plot from the input dataset --##
dge_pca_plot: True

##-- If True: before analysis, drop features which have a zero count across all samples --##
dge_drop_zero: False

######-------------------------------- PLOTTING OPTIONS -----------------------------######
#
# We use a custom Python script for creating all plots. If you wish to use another matplotlib
# stylesheet you can specify that in the Paths File.
#
# We have specified default parameters for small RNA data based on our own "best practices".
# You can change the parameters here.
#
######-------------------------------------------------------------------------------######

##-- Enable plots by uncommenting (removing the '#') for the desired plot type --##
##-- Disable plots by commenting (adding a '#') for the undesired plot type --##
plot_requests:
  - 'len_dist'
  - 'rule_charts'
  - 'class_charts'
  - 'replicate_scatter'
  - 'sample_avg_scatter_by_dge'
  - 'sample_avg_scatter_by_dge_class'

##-- You can set a custom P value to use in DGE scatter plots. Default: 0.05 --##
plot_pval: ~

##-- If True: scatter plot points will be vectorized. If False, only points are raster --##
plot_vector_points: False

##-- Optionally set the min and/or max lengths for len_dist plots; auto if unset --##
plot_len_dist_min:
plot_len_dist_max:

##-- Use this label in class plots for counts assigned by rules lacking a classifier --##
plot_unknown_class: "_UNKNOWN_"

##-- Use this label in class plots for unassigned counts --##
plot_unassigned_class: "_UNASSIGNED_"

##-- Optionally filter the classes in class scatter plots --##
plot_class_scatter_filter:
  style: include  # Choose: include or exclude
  classes: []     # Add classes between [ and ], separated by comma

######----------------------------- OUTPUT DIRECTORIES ------------------------------######
#
# Outputs for each step are organized into their own subdirectories in your run
# directory. You can set these folder names here.
#
######-------------------------------------------------------------------------------######

dir_name_bt_build: bowtie-build
dir_name_fastp: fastp
dir_name_collapser: collapser
dir_name_bowtie: bowtie
dir_name_counter: counter
dir_name_dge: DGE
dir_name_plotter: plots

#########################  AUTOMATICALLY GENERATED CONFIGURATIONS #########################
#
# Do not make any changes to the following sections. These options are automatically
# generated using your Paths File, your Samples and Features sheets, and the above
# settings in this file.
#
###########################################################################################

version: 1.2.1

######--------------------------- DERIVED FROM PATHS FILE ---------------------------######
#
# The following configuration settings are automatically derived from the Paths File
#
######-------------------------------------------------------------------------------######

run_directory: ~
tmp_directory: ~
features_csv: { }
samples_csv: { }
paths_file: { }
gff_files: [ ]
run_bowtie_build: false
reference_genome_files: [ ]
plot_style_sheet: ~
adapter_fasta: ~
ebwt: ~

######------------------------- DERIVED FROM SAMPLES SHEET --------------------------######
#
# The following configuration settings are automatically derived from the Samples Sheet
#
######-------------------------------------------------------------------------------######

##-- Utilized by fastp, tiny-collapse, and bowtie --##
sample_basenames: [ ]

##-- Utilized by fastp --##
# input fastq files
in_fq: [ ]
# output reports
fastp_report_titles: [ ]

###-- Utilized by bowtie --###
# bowtie index files
bt_index_files: [ ]

##-- Utilized by tiny-deseq.r --##
# The control for comparison. If unspecified, all comparisons are made
control_condition:
# If the experiment design yields less than one degree of freedom, tiny-deseq.r is skipped
run_deseq: True

######------------------------- DERIVED FROM FEATURES SHEET -------------------------######
#
# The following configuration settings are automatically derived from the Features Sheet
#
######-------------------------------------------------------------------------------######

######--------------------------- DERIVED FROM RUN CONFIG ---------------------------######
#
# The following configuration settings are automatically derived from this file
#
######-------------------------------------------------------------------------------######

##-- Utilized by tiny-plot --##
# Filters for class scatter plots
plot_class_scatter_filter_include: []
plot_class_scatter_filter_exclude: []```

Error message and traceback

Validating annotation files...
[2023-01-12 14:09:20] INFO /Users/Tammy/miniconda3/envs/tinyrna/bin/cwltool 3.1.20220628170238
[2023-01-12 14:09:20] INFO Resolved '/Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tinyrna_wf.cwl' to 'file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tinyrna_wf.cwl'
[2023-01-12 14:09:23] INFO [workflow ] starting step bt_build_optional
[2023-01-12 14:09:23] INFO [step bt_build_optional] start
[2023-01-12 14:09:23] INFO [workflow ] start
[2023-01-12 14:09:23] INFO [workflow ] starting step preprocessing
[2023-01-12 14:09:23] INFO [step preprocessing] start
[2023-01-12 14:09:23] INFO [workflow preprocessing] starting step fastp
[2023-01-12 14:09:23] INFO [workflow preprocessing] start
[2023-01-12 14:09:23] INFO [job bt_build_optional] /private/tmp/docker_tmppqn0xj2w$ bowtie-build \
    -f \
    /private/tmp/docker_tmpby9k6gzi/stgd704691e-bf73-4e68-854a-6228a219ddd5/ce11.fa \
    ce11 \
    --offrate \
    5 \
    --ftabchars \
    10 \
    --threads \
    4 > /private/tmp/docker_tmppqn0xj2w/console_output.log 2> /private/tmp/docker_tmppqn0xj2w/console_output.log
[2023-01-12 14:09:23] INFO [step fastp] start
[2023-01-12 14:09:24] INFO [step preprocessing] start
[2023-01-12 14:09:24] INFO [workflow preprocessing_2] start
[2023-01-12 14:09:24] INFO [workflow preprocessing_2] starting step fastp_2
[2023-01-12 14:09:24] INFO [step fastp_2] start
[2023-01-12 14:10:35] INFO [job bt_build_optional] Max memory used: 3MiB
[2023-01-12 14:10:35] INFO [job bt_build_optional] completed success
[2023-01-12 14:10:35] INFO [step bt_build_optional] completed success
[2023-01-12 14:10:35] INFO [workflow ] starting step organize_bt_indexes
[2023-01-12 14:10:35] INFO [step organize_bt_indexes] start
[2023-01-12 14:10:35] INFO [job fastp] /private/tmp/docker_tmpz36c_xfr$ fastp \
    --adapter_sequence \
    auto \
    --average_qual \
    30 \
    --compression \
    4 \
    --html \
    Batch1SAMP_S1_L001_R1_001.fastq_qc.html \
    --json \
    Batch1SAMP_S1_L001_R1_001.fastq_qc.json \
    --length_limit \
    35 \
    --length_required \
    15 \
    --n_base_limit \
    1 \
    --qualified_quality_phred \
    20 \
    --report_title \
    condition1_rep_1 \
    --thread \
    2 \
    --unqualified_percent_limit \
    10 \
    --in1 \
    /private/tmp/docker_tmpyudfd_n3/stga1b9b85c-99bf-446a-a9e7-bfd46cf88eb8/Batch1SAMP_S1_L001_R1_001.fastq.gz \
    --out1 \
    Batch1SAMP_S1_L001_R1_001.fastq_cleaned.fastq > /private/tmp/docker_tmpz36c_xfr/Batch1SAMP_S1_L001_R1_001.fastq.gz_console_output.log 2> /private/tmp/docker_tmpz36c_xfr/Batch1SAMP_S1_L001_R1_001.fastq.gz_console_output.log
[2023-01-12 14:10:35] WARNING [job fastp] exited with status: 255
[2023-01-12 14:10:35] INFO [step organize_bt_indexes] completed success
[2023-01-12 14:10:36] ERROR [job fastp] Job error:
("Error collecting output for parameter 'fastq1': ../../miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/tools/fastp.cwl:247:7: Did not find output file with glob pattern: '['Batch1SAMP_S1_L001_R1_001.fastq_cleaned.fastq']'.", {})
[2023-01-12 14:10:36] WARNING [job fastp] completed permanentFail
[2023-01-12 14:10:36] ERROR [step fastp] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/preprocessing.cwl#fastp/fastq1
[2023-01-12 14:10:36] ERROR [step fastp] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/preprocessing.cwl#fastp/report_json
[2023-01-12 14:10:36] ERROR [step fastp] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/preprocessing.cwl#fastp/report_html
[2023-01-12 14:10:36] ERROR [step fastp] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/preprocessing.cwl#fastp/console_output
[2023-01-12 14:10:36] WARNING [step fastp] completed permanentFail
[2023-01-12 14:10:36] INFO [workflow preprocessing] completed permanentFail

Thank you!

leetamm2 commented 1 year ago

Hello again,

I tried running tinyRNA with just the file that was able to get through fastp as mentioned above. It was able to get through the bowtie steps but the same error message came up when it reached tinycounts (I also tried resuming run by using tiny recount):

error message and traceback

(tinyrna) Tammy@Tammys-MacBook tinyrna_2023-01-13_11-27-35_run_directory % tiny recount --config run_config.yml 
Resuming pipeline execution at the tiny-count step...
[2023-01-13 11:32:41] INFO /Users/Tammy/miniconda3/envs/tinyrna/bin/cwltool 3.1.20220628170238
[2023-01-13 11:32:41] INFO Resolved '/Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl' to 'file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl'
[2023-01-13 11:32:43] INFO [workflow ] starting step counter
[2023-01-13 11:32:43] INFO [workflow ] start
[2023-01-13 11:32:43] INFO [step counter] start
[2023-01-13 11:32:43] INFO [job counter] /private/tmp/docker_tmpb3j87co4$ tiny-count \
    -p \
    -nh \
    true \
    -o \
    tinyrna_2023-01-13_11-32-40 \
    -pf \
    /private/tmp/docker_tmpcod7gxea/stgd4d8bd31-0c68-4170-8cf1-b00779ea7133/paths.yml \
    -sv \
    Cython > /private/tmp/docker_tmpb3j87co4/console_output.log
Traceback (most recent call last):
  File "/Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/rna/counter/counter.py", line 249, in main
    counter = FeatureCounter(gff_files, selection, **args)
  File "/Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/rna/counter/features.py", line 36, in __init__
    Features(*reference_tables.get())
  File "/Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/rna/util.py", line 26, in wrapper
    return_val = func(*args, **kwargs)
  File "/Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/rna/counter/hts_parsing.py", line 490, in get
    raise ValueError("No features were retained while parsing your GFF file.\n"
ValueError: No features were retained while parsing your GFF file.
This may be due to a lack of features matching 'Select for...with value...'

tiny-count encountered an error. Don't worry! You don't have to start over.
You can resume the pipeline at tiny-count. To do so:
    1. cd into your Run Directory
    2. Run "tiny recount --config your_run_config.yml"
       (that's the processed run config) ^^^

[2023-01-13 11:33:36] INFO [job counter] Max memory used: 61MiB
[2023-01-13 11:33:36] ERROR [job counter] Job error:
("Error collecting output for parameter 'alignment_stats': ../../../miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/tools/tiny-count.cwl:116:7: Did not find output file with glob pattern: '['tinyrna_2023-01-13_11-32-40_alignment_stats.csv']'.", {})
[2023-01-13 11:33:36] WARNING [job counter] completed permanentFail
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/feature_counts
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/rule_counts
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/norm_counts
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/mapped_nt_len_dist
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/assigned_nt_len_dist
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/alignment_stats
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/summary_stats
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/console_output
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/decollapsed_sams
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/intermed_out_files
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/alignment_diags
[2023-01-13 11:33:36] ERROR [step counter] Output is missing expected field file:///Users/Tammy/miniconda3/envs/tinyrna/lib/python3.9/site-packages/tiny/cwl/workflows/tiny-resume.cwl#counter/selection_diags
[2023-01-13 11:33:36] WARNING [step counter] completed permanentFail
[2023-01-13 11:33:36] INFO [workflow ] completed permanentFail
{
    "counter_out_dir": null,
    "dge_out_dir": null,
    "plotter_out_dir": null
}
[2023-01-13 11:33:36] WARNING Final process status is permanentFail

Thanks again!

AlexTate commented 1 year ago

Hey @leetamm2, thank you for reaching out.

Some background

The majority of the terminal output produced by the tiny command comes from the workflow runner, cwltool, which doesn't have any special knowledge about the tools it runs in each step so the errors it produces are fairly generic.

Most tools have their terminal output redirected to a log file for auto-documentation. This output is fully captured for third party tools and partially captured, allowing errors through, for tinyRNA tools. The CWL specification doesn't currently support redirecting output to a log file and to the terminal, but I have contacted the CWL team proposing a new capture method that would make this situation more user-friendly.

Re: your first post

The terminal output you included does not indicate that any fastq files completed quality filtering. The workflow runner indicated that it was preparing a second fastp job, but the first job immediately exited so the workflow runner terminated the workflow before the second job could begin. fastp returns 255 for all errors so we'll need to see what it said:

  1. In the first section of the Run Config, set verbosity: debug and save it.
  2. Ensure that your Samples Sheet still includes the problematic fastq file.
  3. Execute tiny run again using this Run Config. This will generate a lot of terminal output.
  4. After the workflow exits, press cmd + f and search for 2>. The path to the fastp log file will be printed after this token.
  5. Copy the log file path and run open {path} in your terminal, where {path} is what you copied
  6. Finally, make sure to set verbosity: normal in your Run Config once you're done. Otherwise large intermediate files will be left in your temporary files directory.

This will allow you to see the specific error fastp encountered.

Re: your second post

While "Error collecting output for parameter" was also included in this output, this isn't the same issue as your first post; it's a generic error from the workflow runner. Since this step is a tinyRNA tool, error messages are printed on the terminal and if you look a little further up in the output you'll see:

ValueError: No features were retained while parsing your GFF file.
This may be due to a lack of features matching 'Select for...with value...'

This means that there is an incompatibility between the selection rules in your Features Sheet and the features in your GFF file. You may be selecting for GFF column 9 attributes that aren't present, or you may be using filters for sources/types (GFF columns 2 and 3) that aren't present.

Closing remarks

We intend to streamline this debugging process in a future release to make it more user friendly

leetamm2 commented 1 year ago

Hi @AlexTate ,

Thanks for the prompt response.

Re: fastp error

I've followed the steps you've suggested and was able to open the log files:

for my first file there seems to be a decompressing error:

Detecting adapter sequence for read1...
ERROR: igzip: encountered while decompressing file: /private/tmp/docker_tmpru2e279a/stg685defcb-b2ed-4c9f-bcc6-01d4abf0111f/Batch1SAMP_S1_L001_R1_001.fastq.gz

Does this suggest that my file is corrupted?

Re: second post

You're right, the exact ValueError appeared for my second file. I'm quite new to sequencing analysis so my apologies in advance, is there an example of how the features.csv should be set up? I kept everything default since I wasn't sure what to select for.

Thanks again for your help!

taimontgomery commented 1 year ago

Hi @leetamm2, If you email us your GFF/GTF file and your features.csv we would be happy to help troubleshoot the counting issue. See the tinyRNA preprint on biorxiv for my email address. In regards to the fastq file, you may want to try decompressing it with the command "gzip -d file_name" and then count the number of lines with "wc -l file_name". If you divide the number returned by wc by 4, it will give you the number of reads. If the number of reads is not what you expect based on the sequencing report (assuming you have access to it), or if you get a decimal number (which would indicate the file is truncated assuming it doesn't have any header lines), or if the "gzip -d" command throws an error, then the file was likely corrupted. Let us know and we can help you with that as well.

AlexTate commented 1 year ago

Closing issue due to inactivity