YeoLab / outrigger

Create a *de novo* alternative splicing database, validate splicing events, and quantify percent spliced-in (Psi) from RNA seq data
http://yeolab.github.io/outrigger/
BSD 3-Clause "New" or "Revised" License
61 stars 22 forks source link

outrigger index fails processing bam files with UnboundLocalError: local variable 'start' referenced before assignment #83

Closed alaindomissy closed 4 years ago

alaindomissy commented 7 years ago

Description

outrigger index fails processing bam files with error:

UnboundLocalError: local variable 'start' referenced before assignment

while executing step:

outrigger/io/bam.py in _report_read_positions(read=<pysam.libcalignedsegment.AlignedSegment object>, counter=Counter({('chr1', 17056, 17232, '-'): 104, ('chr...2556, '+'): 1, ('chr1', 112805, 129054, '-'): 1}))

Steps to Reproduce

  1. use outrigger 1.0.0 on TSCC:
    $ module load outrigger/1.0.0
  2. get bams from RBFOX2 HPG2 encode project
  3. sort, index these bams, and place them in a folder "bams"
  4. run this:
    $  outrigger index \
           --bam bams/*.sorted.bam \
           --gtf /projects/ps-yeolab/genomes/hg19/gencode_v19/gencode.v19.annotation.gtf 

Expected behavior:

Expecting outrigger index to proceed successfully and produce the junction index database

Actual behavior:

Getting this log and stack trace:

2017-04-27 08:36:15     Creating folder ./outrigger_output ...
2017-04-27 08:36:15             Done.
2017-04-27 08:36:15     Creating folder ./outrigger_output/index ...
2017-04-27 08:36:15             Done.
2017-04-27 08:36:15     Creating folder ./outrigger_output/index/gtf ...
2017-04-27 08:36:15             Done.
2017-04-27 08:36:15     Creating folder ./outrigger_output/junctions ...
2017-04-27 08:36:15             Done.
2017-04-27 08:36:15     Reading bam files and creating a big splice junction table of reads spanning exon-exon junctions
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py", line 130, in __call__
    return self.func(*args, **kwargs)
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py", line 72, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py", line 133, in bam_to_junction_reads_table
    uniquely, multi = _get_junction_reads(bam_filename)
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py", line 126, in _get_junction_reads
    _report_read_positions(read, counter)
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py", line 26, in _report_read_positions
    counter[(chrom, start, stop, strand)] += 1
UnboundLocalError: local variable 'start' referenced before assignment

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py", line 140, in __call__
    raise TransportableException(text, e_type)
joblib.my_exceptions.TransportableException: TransportableException
___________________________________________________________________________
UnboundLocalError                                  Thu Apr 27 08:36:25 2017
PID: 26508Python 3.5.3: /projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/bin/python
...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
     67     def __init__(self, iterator_slice):
     68         self.items = list(iterator_slice)
     69         self._size = len(self.items)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function bam_to_junction_reads_table>, ('bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', False), {})]
     73 
     74     def __len__(self):
     75         return self._size
     76 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
     67     def __init__(self, iterator_slice):
     68         self.items = list(iterator_slice)
     69         self._size = len(self.items)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function bam_to_junction_reads_table>
        args = ('bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', False)
        kwargs = {}
     73 
     74     def __len__(self):
     75         return self._size
     76 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py in bam_to_junction_reads_table(bam_filename='bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', ignore_multimapping=False)
    128     return uniquely, multi
    129 
    130 
    131 def bam_to_junction_reads_table(bam_filename, ignore_multimapping=False):
    132     """Create a table of reads for this bam file"""
--> 133     uniquely, multi = _get_junction_reads(bam_filename)
        uniquely = undefined
        multi = undefined
        bam_filename = 'bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam'
    134     reads = _combine_uniquely_multi(uniquely, multi, ignore_multimapping)
    135 
    136     # Remove "junctions" with same start and stop
    137     reads = reads.loc[reads[JUNCTION_START] != reads[JUNCTION_STOP]]

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py in _get_junction_reads(filename='bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam')
    121             if read.mapping_quality < 255:
    122                 counter = multi
    123             else:
    124                 counter = uniquely
    125 
--> 126             _report_read_positions(read, counter)
        read = <pysam.libcalignedsegment.AlignedSegment object>
        counter = Counter({('chr1', 17056, 17232, '-'): 104, ('chr...9320, '+'): 1, ('chr1', 123715, 129054, '-'): 1})
    127     samfile.close()
    128     return uniquely, multi
    129 
    130 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py in _report_read_positions(read=<pysam.libcalignedsegment.AlignedSegment object>, counter=Counter({('chr1', 17056, 17232, '-'): 104, ('chr...9320, '+'): 1, ('chr1', 123715, 129054, '-'): 1}))
     21             # Add one to be compatible with STAR output and show the
     22             # start of the intron (not the end of the exon)
     23             start = genome_loc + 1
     24         elif read_loc and last_read_pos is None:
     25             stop = genome_loc  # we are right exclusive ,so this is correct
---> 26             counter[(chrom, start, stop, strand)] += 1
        counter = Counter({('chr1', 17056, 17232, '-'): 104, ('chr...9320, '+'): 1, ('chr1', 123715, 129054, '-'): 1})
        chrom = 'chr1'
        start = undefined
        stop = 319967
        strand = '-'
     27             del start
     28             del stop
     29         last_read_pos = read_loc
     30 

UnboundLocalError: local variable 'start' referenced before assignment
___________________________________________________________________________
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py", line 727, in retrieve
    self._output.extend(job.get())
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
joblib.my_exceptions.TransportableException: TransportableException
___________________________________________________________________________
UnboundLocalError                                  Thu Apr 27 08:36:25 2017
PID: 26508Python 3.5.3: /projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/bin/python
...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
     67     def __init__(self, iterator_slice):
     68         self.items = list(iterator_slice)
     69         self._size = len(self.items)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function bam_to_junction_reads_table>, ('bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', False), {})]
     73 
     74     def __len__(self):
     75         return self._size
     76 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
     67     def __init__(self, iterator_slice):
     68         self.items = list(iterator_slice)
     69         self._size = len(self.items)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function bam_to_junction_reads_table>
        args = ('bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', False)
        kwargs = {}
     73 
     74     def __len__(self):
     75         return self._size
     76 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py in bam_to_junction_reads_table(bam_filename='bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', ignore_multimapping=False)
    128     return uniquely, multi
    129 
    130 
    131 def bam_to_junction_reads_table(bam_filename, ignore_multimapping=False):
    132     """Create a table of reads for this bam file"""
--> 133     uniquely, multi = _get_junction_reads(bam_filename)
        uniquely = undefined
        multi = undefined
        bam_filename = 'bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam'
    134     reads = _combine_uniquely_multi(uniquely, multi, ignore_multimapping)
    135 
    136     # Remove "junctions" with same start and stop
    137     reads = reads.loc[reads[JUNCTION_START] != reads[JUNCTION_STOP]]

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py in _get_junction_reads(filename='bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam')
    121             if read.mapping_quality < 255:
    122                 counter = multi
    123             else:
    124                 counter = uniquely
    125 
--> 126             _report_read_positions(read, counter)
        read = <pysam.libcalignedsegment.AlignedSegment object>
        counter = Counter({('chr1', 17056, 17232, '-'): 104, ('chr...9320, '+'): 1, ('chr1', 123715, 129054, '-'): 1})
    127     samfile.close()
    128     return uniquely, multi
    129 
    130 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py in _report_read_positions(read=<pysam.libcalignedsegment.AlignedSegment object>, counter=Counter({('chr1', 17056, 17232, '-'): 104, ('chr...9320, '+'): 1, ('chr1', 123715, 129054, '-'): 1}))
     21             # Add one to be compatible with STAR output and show the
     22             # start of the intron (not the end of the exon)
     23             start = genome_loc + 1
     24         elif read_loc and last_read_pos is None:
     25             stop = genome_loc  # we are right exclusive ,so this is correct
---> 26             counter[(chrom, start, stop, strand)] += 1
        counter = Counter({('chr1', 17056, 17232, '-'): 104, ('chr...9320, '+'): 1, ('chr1', 123715, 129054, '-'): 1})
        chrom = 'chr1'
        start = undefined
        stop = 319967
        strand = '-'
     27             del start
     28             del stop
     29         last_read_pos = read_loc
     30 

UnboundLocalError: local variable 'start' referenced before assignment
___________________________________________________________________________

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/bin/outrigger", line 11, in <module>
    sys.exit(main())
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py", line 1072, in main
    cl = CommandLine(sys.argv[1:])
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py", line 344, in __init__
    self.args.func()
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py", line 348, in index
    index.execute()
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py", line 746, in execute
    spliced_reads = self.csv()
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py", line 475, in csv
    splice_junctions = self.make_junction_reads_file()
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py", line 463, in make_junction_reads_file
    self.bam, self.ignore_multimapping, self.n_jobs)
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py", line 149, in read_multiple_bams
    for filename in bam_filenames)
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py", line 810, in __call__
    self.retrieve()
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py", line 757, in retrieve
    raise exception
joblib.my_exceptions.JoblibUnboundLocalError: JoblibUnboundLocalError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/bin/outrigger in <module>()
      6 
      7 from outrigger.commandline import main
      8 
      9 if __name__ == '__main__':
     10     sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
---> 11     sys.exit(main())
     12 
     13 
     14 
     15 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py in main()
   1067         util.done()
   1068 
   1069 
   1070 def main():
   1071     try:
-> 1072         cl = CommandLine(sys.argv[1:])
        cl = undefined
   1073     except Usage:
   1074         cl.do_usage_and_die()
   1075 
   1076 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py in __init__(self=<outrigger.commandline.CommandLine object>, input_options=['index', '--bam', 'bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF511KFD.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF686VNL.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF880HKN.bam.sorted.bam', '--gtf', '/projects/ps-yeolab/genomes/hg19/gencode_v19/gencode.v19.annotation.gtf'])
    339         if self.args is not None:
    340             if self.args.debug:
    341                 print(self.args)
    342                 print(input_options)
    343 
--> 344             self.args.func()
        self.args.func = <bound method CommandLine.index of <outrigger.commandline.CommandLine object>>
    345 
    346     def index(self):
    347         index = Index(**vars(self.args))
    348         index.execute()

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py in index(self=<outrigger.commandline.CommandLine object>)
    343 
    344             self.args.func()
    345 
    346     def index(self):
    347         index = Index(**vars(self.args))
--> 348         index.execute()
        index.execute = <bound method Index.execute of <outrigger.commandline.Index object>>
    349 
    350     def validate(self):
    351         validate = Validate(**vars(self.args))
    352         validate.execute()

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py in execute(self=<outrigger.commandline.Index object>)
    741         logger = logging.getLogger('outrigger.index')
    742 
    743         if self.debug:
    744             logger.setLevel(10)
    745 
--> 746         spliced_reads = self.csv()
        spliced_reads = undefined
        self.csv = <bound method Subcommand.csv of <outrigger.commandline.Index object>>
    747 
    748         spliced_reads = self.filter_junctions_on_reads(spliced_reads)
    749         metadata_csv = os.path.join(self.junctions_folder, METADATA_CSV)
    750         metadata = self.junction_metadata(spliced_reads, metadata_csv)

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py in csv(self=<outrigger.commandline.Index object>)
    470         return splice_junctions
    471 
    472     def csv(self):
    473         """Create a csv file of compiled splice junctions"""
    474         if not os.path.exists(self.junction_reads_filename):
--> 475             splice_junctions = self.make_junction_reads_file()
        splice_junctions = undefined
        self.make_junction_reads_file = <bound method Subcommand.make_junction_reads_file of <outrigger.commandline.Index object>>
    476         else:
    477             util.progress('Found compiled junction reads file in {} and '
    478                           'reading it in '
    479                           '...'.format(self.junction_reads_filename))

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/commandline.py in make_junction_reads_file(self=<outrigger.commandline.Index object>)
    458         else:
    459             util.progress('Reading bam files and creating a big splice '
    460                           'junction table of reads spanning exon-exon '
    461                           'junctions')
    462             splice_junctions = bam.read_multiple_bams(
--> 463                 self.bam, self.ignore_multimapping, self.n_jobs)
        self.bam = ['bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF511KFD.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF686VNL.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF880HKN.bam.sorted.bam']
        self.ignore_multimapping = False
        self.n_jobs = -1
    464         dirname = os.path.dirname(self.junction_reads_filename)
    465         if not os.path.exists(dirname):
    466             os.makedirs(dirname)
    467         util.progress('Writing {} ...\n'.format(self.junction_reads_filename))

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py in read_multiple_bams(bam_filenames=['bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF511KFD.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF686VNL.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF880HKN.bam.sorted.bam'], ignore_multimapping=False, n_jobs=-1)
    144 
    145 def read_multiple_bams(bam_filenames, ignore_multimapping=False, n_jobs=-1):
    146     dfs = joblib.Parallel(n_jobs=n_jobs)(
    147         joblib.delayed(
    148             bam_to_junction_reads_table)(filename, ignore_multimapping)
--> 149         for filename in bam_filenames)
        bam_filenames = ['bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF511KFD.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF686VNL.bam.sorted.bam', 'bams_RBFOX2_HPG2/ENCFF880HKN.bam.sorted.bam']
    150     reads = pd.concat(dfs, ignore_index=True)
    151     return reads
    152 
    153 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py in __call__(self=Parallel(n_jobs=-1), iterable=<generator object read_multiple_bams.<locals>.<genexpr>>)
    805             if pre_dispatch == "all" or n_jobs == 1:
    806                 # The iterable was consumed all at once by the above for loop.
    807                 # No need to wait for async callbacks to trigger to
    808                 # consumption.
    809                 self._iterating = False
--> 810             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
    811             # Make sure that we get a last message telling us we are done
    812             elapsed_time = time.time() - self._start_time
    813             self._print('Done %3i out of %3i | elapsed: %s finished',
    814                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
UnboundLocalError                                  Thu Apr 27 08:36:25 2017
PID: 26508Python 3.5.3: /projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/bin/python
...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
     67     def __init__(self, iterator_slice):
     68         self.items = list(iterator_slice)
     69         self._size = len(self.items)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function bam_to_junction_reads_table>, ('bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', False), {})]
     73 
     74     def __len__(self):
     75         return self._size
     76 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
     67     def __init__(self, iterator_slice):
     68         self.items = list(iterator_slice)
     69         self._size = len(self.items)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function bam_to_junction_reads_table>
        args = ('bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', False)
        kwargs = {}
     73 
     74     def __len__(self):
     75         return self._size
     76 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py in bam_to_junction_reads_table(bam_filename='bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam', ignore_multimapping=False)
    128     return uniquely, multi
    129 
    130 
    131 def bam_to_junction_reads_table(bam_filename, ignore_multimapping=False):
    132     """Create a table of reads for this bam file"""
--> 133     uniquely, multi = _get_junction_reads(bam_filename)
        uniquely = undefined
        multi = undefined
        bam_filename = 'bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam'
    134     reads = _combine_uniquely_multi(uniquely, multi, ignore_multimapping)
    135 
    136     # Remove "junctions" with same start and stop
    137     reads = reads.loc[reads[JUNCTION_START] != reads[JUNCTION_STOP]]

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py in _get_junction_reads(filename='bams_RBFOX2_HPG2/ENCFF430GWF.bam.sorted.bam')
    121             if read.mapping_quality < 255:
    122                 counter = multi
    123             else:
    124                 counter = uniquely
    125 
--> 126             _report_read_positions(read, counter)
        read = <pysam.libcalignedsegment.AlignedSegment object>
        counter = Counter({('chr1', 17056, 17232, '-'): 104, ('chr...9320, '+'): 1, ('chr1', 123715, 129054, '-'): 1})
    127     samfile.close()
    128     return uniquely, multi
    129 
    130 

...........................................................................
/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/outrigger/io/bam.py in _report_read_positions(read=<pysam.libcalignedsegment.AlignedSegment object>, counter=Counter({('chr1', 17056, 17232, '-'): 104, ('chr...9320, '+'): 1, ('chr1', 123715, 129054, '-'): 1}))
     21             # Add one to be compatible with STAR output and show the
     22             # start of the intron (not the end of the exon)
     23             start = genome_loc + 1
     24         elif read_loc and last_read_pos is None:
     25             stop = genome_loc  # we are right exclusive ,so this is correct
---> 26             counter[(chrom, start, stop, strand)] += 1
        counter = Counter({('chr1', 17056, 17232, '-'): 104, ('chr...9320, '+'): 1, ('chr1', 123715, 129054, '-'): 1})
        chrom = 'chr1'
        start = undefined
        stop = 319967
        strand = '-'
     27             del start
     28             del stop
     29         last_read_pos = read_loc
     30 

UnboundLocalError: local variable 'start' referenced before assignment
___________________________________________________________________________

Versions

$ outrigger --version
outrigger 1.0.0
olgabot commented 7 years ago

As you mentioned when we talked about this, this bug is likely because the .bams were generated by TopHat rather than STAR, and creates a slightly different output so outrigger gets an error

alaindomissy commented 7 years ago

I ran both versions of bam files provided by ENCODE for RBFOX2: the TopHat version, as well as the STAR version. Both cases produce the same error. I'll document shortly steps to reproduce with the STAR produced bam files.

alaindomissy commented 7 years ago

Description

outrigger index fails processing bam files with error:

KeyError: 'junction_start'

while executing step:

 File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)

Steps to Reproduce

use outrigger 1.0.0 on TSCC:

    $ module load outrigger/1.0.0

get bams from RBFOX2 HPG2 encode project for HepG2 (hg19v19 version) :

     $ wget  https://www.encodeproject.org/files/ENCFF809YGE/
     $ wget  https://www.encodeproject.org/files/ENCFF419JEJ/

sort, index these bams run this script :

   $cat ./outrigger_index.sh
   outrigger index \
         --bam *.sorted.bam \
         --gtf /projects/ps-yeolab/genomes/hg19/gencode_v19/gencode.v19.annotation.gtf 
    $./outrigger_index.sh

Expected behavior:

Expecting outrigger index to proceed successfully and produce the junction index database Actual behavior:

Getting this log and stack trace:

$cd bams_RBFOX2_K562_ENCODE_hg19v29/
$ls -l
total 21G
-rw-r--r-- 1 adomissy yeo-group 4.3G Aug 26  2015 ENCFF148PKR.bam
-rw-r--r-- 1 adomissy yeo-group 7.2G May 31 14:22 ENCFF148PKR.bam.sorted.bam
-rw-r--r-- 1 adomissy yeo-group  11M May 31 16:59 ENCFF148PKR.bam.sorted.bam.bai
-rw-r--r-- 1 adomissy yeo-group 3.5G Aug 26  2015 ENCFF751HVW.bam
-rw-r--r-- 1 adomissy yeo-group 5.5G May 31 16:31 ENCFF751HVW.bam.sorted.bam
-rw-r--r-- 1 adomissy yeo-group  10M May 31 16:48 ENCFF751HVW.bam.sorted.bam.bai
-rwxr-xr-x 1 adomissy yeo-group  155 Jun  2 08:51 outrigger_index.sh*
$./outrigger_index.sh
2017-06-03 21:47:41     Creating folder ./outrigger_output ...
2017-06-03 21:47:41             Done.
2017-06-03 21:47:41     Creating folder ./outrigger_output/index ...
2017-06-03 21:47:41             Done.
2017-06-03 21:47:41     Creating folder ./outrigger_output/index/gtf ...
2017-06-03 21:47:41             Done.
2017-06-03 21:47:41     Creating folder ./outrigger_output/junctions ...
2017-06-03 21:47:41             Done.
2017-06-03 21:47:41     Reading bam files and creating a big splice junction table of reads spanning exon-exon junctions
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/pandas/indexes/base.py", line 2134, in get_loc
    return self._engine.get_loc(key)
  File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'junction_start'

Versions

$ outrigger --version
outrigger 1.0.0
alaindomissy commented 7 years ago

Identical steps, error and stack trace are also the case with the bams for the other cell line (K562) In this case the files to download from encode are the following:

get bams from RBFOX2 HPG2 encode project for HepG2 (hg19v19 version) :

     $ wget https://www.encodeproject.org/files/ENCFF927XKW/
     $ wget https://www.encodeproject.org/files/ENCFF527PPL/
haoger commented 6 years ago

Hi, Alain Domissy. Have you solved this problem? I have the same bug with you.

flophys commented 6 years ago

Hi @alaindomissy and @olgabot I have the very same problem, getting the error "UnboundLocalError: local variable 'start' referenced before assignment" - have you got to the root of this/found a solution?

Thanks.

diazlab commented 6 years ago

same problem here, files aligned with HISAT2

giuseppedelnapalle commented 5 years ago

I got the same problem with the bam file created by STAR. Hope to get it solved soon.

jfass commented 4 years ago

I'll echo the above; I have STAR-aligned BAMs (STAR 2.5.3a) and I'm still getting this error with outrigger 1.1.1.

[...]
...........................................................................
/home/ubuntu/outrigger/miniconda2/envs/outrigger-env/lib/python2.7/site-packages/outrigger/io/bam.py in _report_read_positions(read=<pysam.l
ibcalignedsegment.AlignedSegment object>, counter=Counter({('chr11', 1017567, 1017731, '+'): 11072..., '+'): 1, ('chr3', 29323245, 29323245,
 '+'): 1}))
     21             # Add one to be compatible with STAR output and show the
     22             # start of the intron (not the end of the exon)
     23             start = genome_loc + 1
     24         elif read_loc and last_read_pos is None:
     25             stop = genome_loc  # we are right exclusive ,so this is correct
---> 26             counter[(chrom, start, stop, strand)] += 1
        counter = Counter({('chr11', 1017567, 1017731, '+'): 11072..., '+'): 1, ('chr3', 29323245, 29323245, '+'): 1})
        chrom = 'chr11'
        start = undefined
        stop = 1212806
        strand = '+'
     27             del start
     28             del stop
     29         last_read_pos = read_loc
     30 

UnboundLocalError: local variable 'start' referenced before assignment
___________________________________________________________________________

Could this be a problem with gene definitions in the GTF file? I'm using GENCODE:

outrigger index --bam /home/ubuntu/data/dedups/[1P]*.dedup.bam   --gtf gencode.v19.annotation.gtf
gokceneraslan commented 4 years ago

You can try and check if https://github.com/YeoLab/outrigger/pull/97 fixes the issue, cheers.