Closed alaindomissy closed 4 years ago
As you mentioned when we talked about this, this bug is likely because the .bam
s were generated by TopHat rather than STAR, and creates a slightly different output so outrigger
gets an error
I ran both versions of bam files provided by ENCODE for RBFOX2: the TopHat version, as well as the STAR version. Both cases produce the same error. I'll document shortly steps to reproduce with the STAR produced bam files.
outrigger index fails processing bam files with error:
KeyError: 'junction_start'
while executing step:
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
use outrigger 1.0.0 on TSCC:
$ module load outrigger/1.0.0
get bams from RBFOX2 HPG2 encode project for HepG2 (hg19v19 version) :
$ wget https://www.encodeproject.org/files/ENCFF809YGE/
$ wget https://www.encodeproject.org/files/ENCFF419JEJ/
sort, index these bams run this script :
$cat ./outrigger_index.sh
outrigger index \
--bam *.sorted.bam \
--gtf /projects/ps-yeolab/genomes/hg19/gencode_v19/gencode.v19.annotation.gtf
$./outrigger_index.sh
Expecting outrigger index to proceed successfully and produce the junction index database Actual behavior:
$cd bams_RBFOX2_K562_ENCODE_hg19v29/
$ls -l
total 21G
-rw-r--r-- 1 adomissy yeo-group 4.3G Aug 26 2015 ENCFF148PKR.bam
-rw-r--r-- 1 adomissy yeo-group 7.2G May 31 14:22 ENCFF148PKR.bam.sorted.bam
-rw-r--r-- 1 adomissy yeo-group 11M May 31 16:59 ENCFF148PKR.bam.sorted.bam.bai
-rw-r--r-- 1 adomissy yeo-group 3.5G Aug 26 2015 ENCFF751HVW.bam
-rw-r--r-- 1 adomissy yeo-group 5.5G May 31 16:31 ENCFF751HVW.bam.sorted.bam
-rw-r--r-- 1 adomissy yeo-group 10M May 31 16:48 ENCFF751HVW.bam.sorted.bam.bai
-rwxr-xr-x 1 adomissy yeo-group 155 Jun 2 08:51 outrigger_index.sh*
$./outrigger_index.sh
2017-06-03 21:47:41 Creating folder ./outrigger_output ...
2017-06-03 21:47:41 Done.
2017-06-03 21:47:41 Creating folder ./outrigger_output/index ...
2017-06-03 21:47:41 Done.
2017-06-03 21:47:41 Creating folder ./outrigger_output/index/gtf ...
2017-06-03 21:47:41 Done.
2017-06-03 21:47:41 Creating folder ./outrigger_output/junctions ...
2017-06-03 21:47:41 Done.
2017-06-03 21:47:41 Reading bam files and creating a big splice junction table of reads spanning exon-exon junctions
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/projects/ps-yeolab/software/eclipconda/envs/outrigger-1.0.0/lib/python3.5/site-packages/pandas/indexes/base.py", line 2134, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'junction_start'
$ outrigger --version
outrigger 1.0.0
Identical steps, error and stack trace are also the case with the bams for the other cell line (K562) In this case the files to download from encode are the following:
get bams from RBFOX2 HPG2 encode project for HepG2 (hg19v19 version) :
$ wget https://www.encodeproject.org/files/ENCFF927XKW/
$ wget https://www.encodeproject.org/files/ENCFF527PPL/
Hi, Alain Domissy. Have you solved this problem? I have the same bug with you.
Hi @alaindomissy and @olgabot I have the very same problem, getting the error "UnboundLocalError: local variable 'start' referenced before assignment" - have you got to the root of this/found a solution?
Thanks.
same problem here, files aligned with HISAT2
I got the same problem with the bam file created by STAR. Hope to get it solved soon.
I'll echo the above; I have STAR-aligned BAMs (STAR 2.5.3a) and I'm still getting this error with outrigger 1.1.1.
[...]
...........................................................................
/home/ubuntu/outrigger/miniconda2/envs/outrigger-env/lib/python2.7/site-packages/outrigger/io/bam.py in _report_read_positions(read=<pysam.l
ibcalignedsegment.AlignedSegment object>, counter=Counter({('chr11', 1017567, 1017731, '+'): 11072..., '+'): 1, ('chr3', 29323245, 29323245,
'+'): 1}))
21 # Add one to be compatible with STAR output and show the
22 # start of the intron (not the end of the exon)
23 start = genome_loc + 1
24 elif read_loc and last_read_pos is None:
25 stop = genome_loc # we are right exclusive ,so this is correct
---> 26 counter[(chrom, start, stop, strand)] += 1
counter = Counter({('chr11', 1017567, 1017731, '+'): 11072..., '+'): 1, ('chr3', 29323245, 29323245, '+'): 1})
chrom = 'chr11'
start = undefined
stop = 1212806
strand = '+'
27 del start
28 del stop
29 last_read_pos = read_loc
30
UnboundLocalError: local variable 'start' referenced before assignment
___________________________________________________________________________
Could this be a problem with gene definitions in the GTF file? I'm using GENCODE:
outrigger index --bam /home/ubuntu/data/dedups/[1P]*.dedup.bam --gtf gencode.v19.annotation.gtf
You can try and check if https://github.com/YeoLab/outrigger/pull/97 fixes the issue, cheers.
Description
outrigger index fails processing bam files with error:
while executing step:
Steps to Reproduce
Expected behavior:
Actual behavior:
Getting this log and stack trace:
Versions