databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
54 stars 15 forks source link

Clarify error messages for missing assets #186

Closed nsheff closed 3 years ago

nsheff commented 3 years ago
/usr/local/lib/python3.8/site-packages/refgenconf/refgenconf.py:773: RuntimeWarning: For genome 'rCRSd' path to the asset 'bowtie2_index/None:default' doesn't exist: /alias/rCRSd/bowtie2_index/default/rCRSd
warnings.warn(msg, RuntimeWarning)

Some assets are not found. You can update your REFGENIE config file or point directly to the file using the noted command-line arguments:

Pipeline failed at: (06-01 21:28:20) elapsed: 0.0 TIME
Total time: 0:00:01
Failure reason: Required assets not existing: fasta.chrom_sizes:default (--None), fasta.None:default (--None), bowtie2_index.None:default (--None)
Traceback (most recent call last):
File "/bulker/pepatac/pipelines/pepatac.py", line 2817, in
sys.exit(main())
File "/bulker/pepatac/pipelines/pepatac.py", line 709, in main
res, rgc = _add_resources(args, res, check_list)
File "/bulker/pepatac/pipelines/pepatac.py", line 547, in _add_resources
pm.fail_pipeline(IOError(err_msg.format(", ".join(["{asset_name}.{seek_key}:{tag_name} (--{user_arg})".format(**x) for x in required_list]))))
File "/usr/local/lib/python3.8/site-packages/pypiper/manager.py", line 1660, in fail_pipeline
raise exc
OSError: Required assets not existing: fasta.chrom_sizes:default (--None), fasta.None:default (--None), bowtie2_index.None:default (--None)

There are a few issues here:

  1. The warning about the bowtie2 index is misleading. In fact, this one is not expected to exist, because it's a prefix. So... this needs to be changed, somehow.
  2. The genome is not specified in the note that says "Some assets are not found. You can update your REFGENIE config file or point directly to the file using the noted command-line arguments:" -- so I assumed it was rCRSd, due to the preceding warning, but in fact, it was hg38. This error message should specify exactly which genome and assets are missing, and maybe say how to get them.

But this raises a related issue: should we remove the reliance on refgenie and go back to the path way? I will raise that as a separate issue:

nsheff commented 3 years ago

To fix the warning, just change the seek argument to not use a warning.

jpsmith5 commented 3 years ago

Path functionality and independence from refgenie is now inherent.