databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
54 stars 15 forks source link

TypeError #216

Open namzoo99 opened 2 years ago

namzoo99 commented 2 years ago

Hi, I wanted to make consensus peaks, so I followed test tutorial(http://pepatac.databio.org/en/latest/consensus_peaks/). I ran looper run before running looper runp. But, I faced below error. This is the first time running the pipeline, so it's hard to solve the error.

 %  looper run examples/test_project/test_config.yaml

Looper version: 1.3.2
Command: run
## [1 of 1] sample: test1; pipeline: PEPATAC
Calling pre-submit function: refgenconf.looper_refgenie_populate
Traceback (most recent call last):
  File "/Users/nam-yunju/miniconda3/bin/looper", line 8, in <module>
    sys.exit(main())
  File "/Users/nam-yunju/miniconda3/lib/python3.9/site-packages/looper/looper.py", line 745, in main
    run(args, rerun=(args.command == "rerun"), **compute_kwargs)
  File "/Users/nam-yunju/miniconda3/lib/python3.9/site-packages/looper/looper.py", line 353, in __call__
    curr_pl_fails = cndtr.add_sample(sample, rerun=rerun)
  File "/Users/nam-yunju/miniconda3/lib/python3.9/site-packages/looper/conductor.py", line 365, in add_sample
    self.submit()
  File "/Users/nam-yunju/miniconda3/lib/python3.9/site-packages/looper/conductor.py", line 402, in submit
    script = self.write_script(self._pool, self._curr_size)
  File "/Users/nam-yunju/miniconda3/lib/python3.9/site-packages/looper/conductor.py", line 556, in write_script
    namespaces = _exec_pre_submit(self.pl_iface, namespaces)
  File "/Users/nam-yunju/miniconda3/lib/python3.9/site-packages/looper/conductor.py", line 660, in _exec_pre_submit
    _update_namespaces(namespaces, func(namespaces))
  File "/Users/nam-yunju/miniconda3/lib/python3.9/site-packages/refgenconf/populator.py", line 47, in looper_refgenie_populate
    complete_sk_dict = rgc.list_seek_keys_values()
  File "/Users/nam-yunju/miniconda3/lib/python3.9/site-packages/refgenconf/refgenconf.py", line 1130, in list_seek_keys_values
    for seek_key_name in get_tag_seek_keys(tag_mapping):
TypeError: 'NoneType' object is not iterable
jordanc17 commented 2 years ago

I'm running into this same issue - I installed in a conda environment, which I think was successful, but I get an identical error when I try to run test_config.yaml following the installation instructions.

2lore commented 2 years ago

I get an identical error when I am trying to run tutorial_refgenie.yaml with looper.

nsheff commented 2 years ago

@jpsmith5 do you have any insight here? At first glance, could this be related to a missing asset? We may need to improve an error message

Are you sure you have available the refgenie assets for the genome you're trying to work with?

2lore commented 2 years ago

I think so, I'm working with hg38 and pulled the refgenie the assets as in the extended tutorial. If I use 'refgenie seek' for certain files, it works. But I don't know which specific asset would be needed for the pre-submit function?

nsheff commented 2 years ago

@jpsmith5 have you seen this before?

jpsmith5 commented 2 years ago

Hey,

I'm looking at this now. I think I have seen this. Let me do some tests and see what it is and yeah see if I can determine if it is a bad messaging issue. Will update shortly.

jpsmith5 commented 2 years ago

The default settings run successfully for me, but I think this could be a missing rCRSd FASTA asset issue. This is something I've caught elsewhere after some updates to various required packages that I am thinking hasn't been updated in the docs (which I'm doing so now). I'm going to test on my side if not having that asset gives me the same error, and if so update from there. But what you @jordance71 or @2lore can test too while waiting, is grabbing that specific asset and see if the NoneType error is resolved. Will update again here after I get a chance to test the presence or absence of that particular asset.

Using Refgenie

refgenie pull rCRSd/fasta

No Refgenie

wget -O rCRSd.fasta.tgz http://refgenomes.databio.org/v3/assets/archive/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4/fasta?tag=default
tar xvf rCRSd.fasta.tgz 
jpsmith5 commented 2 years ago

I can confirm I can recapitulate the NoneType error when I do not have the rCRSd FASTA asset. So once you grab that you should be good to go. Making sure the docs are updated to reflect that.

nsheff commented 2 years ago

Any ideas for how we can make the error message better? I have two ideas:

  1. Update refgenconf so instead of "TypeError: 'NoneType' object is not iterable" at that line, we actually specify the seek_key_name that is being specified.
  2. Do a quick check for prereqs in pepatac, to confirm that the assets exist, and report if any are missing before running.

Maybe best to do both.

jpsmith5 commented 2 years ago

Yes, I lean both. It is specifically a refgenconf message, so yes, just having that specify what it was trying to achieve but failing to populate would go a long way, and I can add the asset check through the pipeline. Right now I report the assets utilized but that's after it's already been essentially approved, since the command line arguments received an appropriate value. To check the assets, however, I believe this would require the pipeline to be initiated, which it can't do if it's relying on refgenie to populate one of the required command line arguments as is the case here? So, it would never flag as an issue if it wasn't already being populated through refgenconf, if I'm thinking about the order of effects properly.

2lore commented 2 years ago

@jpsmith5 I can confirm that adding the rCRSd/fasta solved the error for me as well. Thanks a lot for helping me out here!