ggirelli / gpseqc

Nuclear centrality estimation from GPSeq experiment.
MIT License
3 stars 2 forks source link

Empty bed file triggers issue. #3

Closed joaquincr closed 6 years ago

joaquincr commented 6 years ago

Before submitting an issue, please be sure to

This issue affects

What did you do (e.g., steps to reproduce)

gpseqc_estimate '/home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/input/TK94_5min_GG__cutsiteLoc-umiCount.bed' '/home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/input/TK95_10min_GG__cutsiteLoc-umiCount.bed' '/home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/input/TK96_15min_GG__cutsiteLoc-umiCount.bed' '/home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/input/TK97_30min_GG__cutsiteLoc-umiCount.bed' '/home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/input/TK98_on_GG__cutsiteLoc-umiCount.bed' -o '/home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/output' -s 1000000 -r BICRO55_excluding_1min_1MB_allMetrics -t 7

What did you expect to happen?

work

What happened instead?

error

Additional information

    Threads : 7
 Output dir : /home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/output
  Bed files : 
   (1) /home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/input/TK94_5min_GG__cutsiteLoc-umiCount.bed
   (2) /home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/input/TK95_10min_GG__cutsiteLoc-umiCount.bed
   (3) /home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/input/TK96_15min_GG__cutsiteLoc-umiCount.bed
   (4) /home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/input/TK97_30min_GG__cutsiteLoc-umiCount.bed
   (5) /home/bicro/Desktop/user_folders_HD2/Quim/1min_exclusion/1Mb/BICRO55/input/TK98_on_GG__cutsiteLoc-umiCount.bed

Confirm settings and proceed? (y/n)
y

Parsing bedfiles and counting reads...
Identifying chromosomes...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.14it/s]
Generating bins...
Preparing cutsites...
Removing empty sites...
Assigning to bins...
[Parallel(n_jobs=7)]: Done   1 tasks      | elapsed:   18.3s
[Parallel(n_jobs=7)]: Done   2 out of   5 | elapsed:   24.1s remaining:   36.2s
[Parallel(n_jobs=7)]: Done   3 out of   5 | elapsed:   27.1s remaining:   18.1s
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/joblib/_parallel_backends.py", line 350, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/bin/gpseqc_estimate", line 453, in do_assign
    bedfiles[i] = bed.to_bins(bins, bedfiles[i])
  File "/usr/local/lib/python3.5/dist-packages/gpseqc/bed.py", line 164, in to_bins
    assert bed.field_count() >= 5, assert_msg
AssertionError: missing score column, run with 'noValues = True'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.5/dist-packages/joblib/_parallel_backends.py", line 359, in __call__
    raise TransportableException(text, e_type)
joblib.my_exceptions.TransportableException: TransportableException
___________________________________________________________________________
AssertionError                                     Tue Apr 24 13:18:21 2018
PID: 28363                                   Python 3.5.2: /usr/bin/python3
...........................................................................
/usr/local/lib/python3.5/dist-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function do_assign>, (2, [<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>], <BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, 'bins.size1000000.step1000000.csm3', Namespace(T='/tmp', bedfile=['/home/bicro/Deskto...ding_1min_1MB_allMetrics.', suffix='', threads=7)), {})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/usr/local/lib/python3.5/dist-packages/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function do_assign>
        args = (2, [<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>], <BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, 'bins.size1000000.step1000000.csm3', Namespace(T='/tmp', bedfile=['/home/bicro/Deskto...ding_1min_1MB_allMetrics.', suffix='', threads=7))
        kwargs = {}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/usr/local/bin/gpseqc_estimate in do_assign(i=2, bedfiles=[<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>], bins=<BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, descr='bins.size1000000.step1000000.csm3', args=Namespace(T='/tmp', bedfile=['/home/bicro/Deskto...ding_1min_1MB_allMetrics.', suffix='', threads=7))
    448 
    449 # (6) Assign reads to bins (intersect) -----------------------------------------
    450 print("Assigning to bins...")
    451 
    452 def do_assign(i, bedfiles, bins, descr, args):
--> 453     bedfiles[i] = bed.to_bins(bins, bedfiles[i])
        bedfiles = [<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>]
        i = 2
        bins = <BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>
    454 
    455     # Save if debugging
    456     if args.debug_mode: bed_saveas(bedfiles[i], "intersected.%s.%s.tsv" % (
    457         descr, os.path.basename(args.bedfile[i])), args)

...........................................................................
/usr/local/lib/python3.5/dist-packages/gpseqc/bed.py in to_bins(bins=<BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, bed=<BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, noValues=False, skipEmpty=True)
    159         pbt.BedTool: grouped bed.
    160     '''
    161 
    162     if not noValues:
    163         assert_msg = "missing score column, run with 'noValues = True'."
--> 164         assert bed.field_count() >= 5, assert_msg
        bed.field_count = <bound method BedTool.field_count of <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>>
        assert_msg = "missing score column, run with 'noValues = True'."
    165         bed = bed.cut(range(5)).sort() # Force to BED5
    166 
    167     # Enforce bins to BED3
    168     bins = bins.cut(range(3)).sort()

AssertionError: missing score column, run with 'noValues = True'.
___________________________________________________________________________
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
joblib.my_exceptions.TransportableException: TransportableException
___________________________________________________________________________
AssertionError                                     Tue Apr 24 13:18:21 2018
PID: 28363                                   Python 3.5.2: /usr/bin/python3
...........................................................................
/usr/local/lib/python3.5/dist-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function do_assign>, (2, [<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>], <BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, 'bins.size1000000.step1000000.csm3', Namespace(T='/tmp', bedfile=['/home/bicro/Deskto...ding_1min_1MB_allMetrics.', suffix='', threads=7)), {})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/usr/local/lib/python3.5/dist-packages/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function do_assign>
        args = (2, [<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>], <BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, 'bins.size1000000.step1000000.csm3', Namespace(T='/tmp', bedfile=['/home/bicro/Deskto...ding_1min_1MB_allMetrics.', suffix='', threads=7))
        kwargs = {}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/usr/local/bin/gpseqc_estimate in do_assign(i=2, bedfiles=[<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>], bins=<BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, descr='bins.size1000000.step1000000.csm3', args=Namespace(T='/tmp', bedfile=['/home/bicro/Deskto...ding_1min_1MB_allMetrics.', suffix='', threads=7))
    448 
    449 # (6) Assign reads to bins (intersect) -----------------------------------------
    450 print("Assigning to bins...")
    451 
    452 def do_assign(i, bedfiles, bins, descr, args):
--> 453     bedfiles[i] = bed.to_bins(bins, bedfiles[i])
        bedfiles = [<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>]
        i = 2
        bins = <BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>
    454 
    455     # Save if debugging
    456     if args.debug_mode: bed_saveas(bedfiles[i], "intersected.%s.%s.tsv" % (
    457         descr, os.path.basename(args.bedfile[i])), args)

...........................................................................
/usr/local/lib/python3.5/dist-packages/gpseqc/bed.py in to_bins(bins=<BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, bed=<BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, noValues=False, skipEmpty=True)
    159         pbt.BedTool: grouped bed.
    160     '''
    161 
    162     if not noValues:
    163         assert_msg = "missing score column, run with 'noValues = True'."
--> 164         assert bed.field_count() >= 5, assert_msg
        bed.field_count = <bound method BedTool.field_count of <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>>
        assert_msg = "missing score column, run with 'noValues = True'."
    165         bed = bed.cut(range(5)).sort() # Force to BED5
    166 
    167     # Enforce bins to BED3
    168     bins = bins.cut(range(3)).sort()

AssertionError: missing score column, run with 'noValues = True'.
___________________________________________________________________________

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/gpseqc_estimate", line 466, in <module>
    for i in range(len(bedfiles)))
  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 789, in __call__
    self.retrieve()
  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 740, in retrieve
    raise exception
joblib.my_exceptions.JoblibAssertionError: JoblibAssertionError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/usr/local/bin/gpseqc_estimate in <module>()
    461     for i in tqdm(range(len(bedfiles))):
    462         do_assign(i, bedfiles, bins, descr, args)
    463 else:
    464     bedfiles = Parallel(n_jobs = args.threads, verbose = 11)(
    465         delayed(do_assign)(i, bedfiles, bins, descr, args)
--> 466         for i in range(len(bedfiles)))
    467 
    468 # (7) Calculate bin statistics -------------------------------------------------
    469 print("Calculating bin statistics...")
    470 

...........................................................................
/usr/local/lib/python3.5/dist-packages/joblib/parallel.py in __call__(self=Parallel(n_jobs=7), iterable=<generator object <genexpr>>)
    784             if pre_dispatch == "all" or n_jobs == 1:
    785                 # The iterable was consumed all at once by the above for loop.
    786                 # No need to wait for async callbacks to trigger to
    787                 # consumption.
    788                 self._iterating = False
--> 789             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=7)>
    790             # Make sure that we get a last message telling us we are done
    791             elapsed_time = time.time() - self._start_time
    792             self._print('Done %3i out of %3i | elapsed: %s finished',
    793                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
AssertionError                                     Tue Apr 24 13:18:21 2018
PID: 28363                                   Python 3.5.2: /usr/bin/python3
...........................................................................
/usr/local/lib/python3.5/dist-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function do_assign>, (2, [<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>], <BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, 'bins.size1000000.step1000000.csm3', Namespace(T='/tmp', bedfile=['/home/bicro/Deskto...ding_1min_1MB_allMetrics.', suffix='', threads=7)), {})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/usr/local/lib/python3.5/dist-packages/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function do_assign>
        args = (2, [<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>], <BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, 'bins.size1000000.step1000000.csm3', Namespace(T='/tmp', bedfile=['/home/bicro/Deskto...ding_1min_1MB_allMetrics.', suffix='', threads=7))
        kwargs = {}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/usr/local/bin/gpseqc_estimate in do_assign(i=2, bedfiles=[<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>], bins=<BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, descr='bins.size1000000.step1000000.csm3', args=Namespace(T='/tmp', bedfile=['/home/bicro/Deskto...ding_1min_1MB_allMetrics.', suffix='', threads=7))
    448 
    449 # (6) Assign reads to bins (intersect) -----------------------------------------
    450 print("Assigning to bins...")
    451 
    452 def do_assign(i, bedfiles, bins, descr, args):
--> 453     bedfiles[i] = bed.to_bins(bins, bedfiles[i])
        bedfiles = [<BedTool(/tmp/pybedtools.fgszti2x.tmp)>, <BedTool(/tmp/pybedtools.jq2ugpxj.tmp)>, <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, <BedTool(/tmp/pybedtools.30h383xh.tmp)>, <BedTool(/tmp/pybedtools.m4ufhh7z.tmp)>]
        i = 2
        bins = <BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>
    454 
    455     # Save if debugging
    456     if args.debug_mode: bed_saveas(bedfiles[i], "intersected.%s.%s.tsv" % (
    457         descr, os.path.basename(args.bedfile[i])), args)

...........................................................................
/usr/local/lib/python3.5/dist-packages/gpseqc/bed.py in to_bins(bins=<BedTool(/tmp/pybedtools.4k4r_8p9.tmp)>, bed=<BedTool(/tmp/pybedtools.4o5myg6a.tmp)>, noValues=False, skipEmpty=True)
    159         pbt.BedTool: grouped bed.
    160     '''
    161 
    162     if not noValues:
    163         assert_msg = "missing score column, run with 'noValues = True'."
--> 164         assert bed.field_count() >= 5, assert_msg
        bed.field_count = <bound method BedTool.field_count of <BedTool(/tmp/pybedtools.4o5myg6a.tmp)>>
        assert_msg = "missing score column, run with 'noValues = True'."
    165         bed = bed.cut(range(5)).sort() # Force to BED5
    166 
    167     # Enforce bins to BED3
    168     bins = bins.cut(range(3)).sort()

AssertionError: missing score column, run with 'noValues = True'.
ggirelli commented 6 years ago

Hej, it's because the TK96 bed file is actually empty. I have fixed it by adding an assert to the main script to check that in the beginning in pygpseq-v.2.0.1. If it happens again it will show the following readable error message.

Traceback (most recent call last):
  File "/usr/local/bin/gpseqc_estimate", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/home/gire/ownCloud/BiCro/Code/repos/gpseqc/bin/gpseqc_estimate", line 361, in <module>
    assert 0 != conds_nreads[i], "empty bedfile found: %s" % args.bedfile[i]
AssertionError: empty bedfile found: /home/gire/Desktop/test/TK96_15min_GG__cutsiteLoc-umiCount.bed