adoebley / Griffin

A flexible framework for nucleosome profiling of cell-free DNA
Other
24 stars 16 forks source link

Griffin Nucleosome Profiling Throws TypeError with Specific Configuration #10

Closed yueyaog closed 1 year ago

yueyaog commented 1 year ago

I am experimenting with different configurations of Griffin for nucleosome profiling, and I noticed that when I set mappability_correction = 'True' and GC_map_corrected_bw = 'path/to/GC_map_corrected.bw' in the griffin_nucleosome_profiling.snakefile, the pipeline throws the following error:

CTCF_demo processing all 1000 sites
CTCF_demo (fw/rv/undirected/total): 0/0/1000/1000
CTCF_demo uncorrected starting fetch 0 min 0 sec
CTCF_demo_uncorrected_fetch_complete: 1000 of 1000 intervals done in 0 min 1 sec, 0 min 0 sec remaining , size 0.07 GB
CTCF_demo GC_corrected starting fetch 0 min 2 sec
CTCF_demo_GC_corrected_fetch_complete: 1000 of 1000 intervals done in 0 min 1 sec, 0 min 0 sec remaining , size 0.07 GB
CTCF_demo GC_map_corrected starting fetch 0 min 4 sec
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/gaoyueya/miniconda3/envs/griffin_demo/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/gaoyueya/miniconda3/envs/griffin_demo/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "../../scripts/griffin_merge_sites.py", line 521, in merge_sites
    results_dict[key]['coverage'] = fetch_bw_values(results_dict[key]['input_path'],current_sites,site_name,key)
  File "../../scripts/griffin_merge_sites.py", line 356, in fetch_bw_values
    if len(values)<(norm_window[1]-norm_window[0]):
TypeError: object of type 'numpy.float32' has no len()
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "../../scripts/griffin_merge_sites.py", line 612, in <module>
    results = p.map(merge_sites, to_do_list, 1) #Send only one interval to each processor at a time.
  File "/home/gaoyueya/miniconda3/envs/griffin_demo/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/gaoyueya/miniconda3/envs/griffin_demo/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
TypeError: object of type 'numpy.float32' has no len()

The issue seems to be caused by a TypeError due to the fetch_bw_values function expecting a list but receiving a float number. It seems that this error is specific to the combination of mappability_correction and GC_map_corrected_bw options. I think you should able to reproduce the error by setting mappability_correction = 'True' and GC_map_corrected_bw = 'path/to/GC_map_corrected.bw' in the griffin_nucleosome_profiling.snakefile and run the snakefile.

Thank you for your attention to this matter.

adoebley commented 1 year ago

Hi Gao,

Thanks for bringing this error to my attention! I think the easiest fix would be to not use mappability correction. We tried it in response to a reviewer suggestion, but found that mappability correction created a lot of noise in the coverage profiles and decided not to use it going forward. I haven't completely removed it from the python scripts yet, but set up the snakefile to always skip the mappability correction. I think this may be related to your error.

If you absolutely need to use mappability correction, you'll probably need a different version of the snakefile. There is one here (config.yaml and snakefile) that is set up to run mappability correction:

https://github.com/adoebley/Griffin_analyses/tree/main/delfi_data_cancer_detection/TFBS_nucleosome_profiling_unfiltered_v2

Let me know if you have more questions, -Anna-Lisa

yueyaog commented 1 year ago

Thanks, Anna-Lisa. It makes a lot of sense. I will not use mappability correction in my analysis.