bam2ec bam_utils.py array/list issue

I am just getting started with alntools and ran into an issue with bam2ec that I was able to remedy by modifying one line of code; I thought I'd let you know in case this is an issue others might have! (I don't feel confident enough to try to contribute to the codebase.)

Call: Using Anaconda 2.1 (I couldn't get EMASE dependencies to resolve with newer versions of conda) on linux (high performance compute cluster), alntools environment created and loaded as specified in https://churchill-lab.github.io/alntools/ BAM was aligned with bowtie2; has ~3M reads alntools bam2ec -t /storage/home/hcoda1/2/abell65/scratch/testdiptranscripts/firsttest/N2ws263_JU1088/emase.pooled.transcripts.info -c 1 --verbose AP01-88_alignedN2ws263_JU1088.bam AP01-88.bin

Output including error:

[alntools] [12/02/2020 12:19:08 PM] Sample not supplied, using filename: AP01-88_alignedN2ws263_JU1088.bam
[alntools] [12/02/2020 12:19:08 PM] Parsing file information ...
[alntools] [12/02/2020 12:19:09 PM] File parsed in 00:00:00.82, total time: 00:00:00.82
[alntools] [12/02/2020 12:19:09 PM] Calculating 1 chunks
[alntools] [12/02/2020 12:19:09 PM] 1 chunks calculated in 00:00:00.03, total time: 00:00:00.84
[alntools] [12/02/2020 12:19:09 PM] Starting 1 processes ...
[alntools] [12/02/2020 12:20:08 PM] DONE Process ID: 0, File: /storage/scratch1/2/abell65/testemase/work/d3/03f099aa022605de44db6fc7a20a38/_bam2ec.0.bam, 19,626,295 valid alignments processed out of 19,658,429, with 28,333 equivalence classes
[alntools] [12/02/2020 12:20:11 PM] Process 1 done out of 1, combining result
[alntools] [12/02/2020 12:20:11 PM] All results combined in 00:01:01.78, total time: 00:01:02.62
[alntools] [12/02/2020 12:20:11 PM] # Valid Alignments: 19,626,295
[alntools] [12/02/2020 12:20:11 PM] # Main Targets: 183,501
[alntools] [12/02/2020 12:20:11 PM] # Haplotypes: 2
[alntools] [12/02/2020 12:20:11 PM] # Equivalence Classes: 28,333
[alntools] [12/02/2020 12:20:11 PM] # Unique Reads: 3,224,924
[alntools] [12/02/2020 12:20:11 PM] Constructing temp APM structure...
Traceback (most recent call last):
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/bin/alntools", line 4, in <module>
    __import__('pkg_resources').run_script('alntools==0.1.1', 'alntools')
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/lib/python2.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1462, in run_script
    exec(code, namespace, namespace)
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/lib/python2.7/site-packages/alntools-0.1.1-py2.7.egg/EGG-INFO/scripts/alntools", line 29, in <module>
    cli()
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/lib/python2.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/lib/python2.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/lib/python2.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/lib/python2.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/lib/python2.7/site-packages/alntools-0.1.1-py2.7.egg/alntools/cli.py", line 66, in bam2ec
    methods.bam2ec(bam_file, ec_file, chunks, directory, number_processes, rangefile, sample, targets)
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/lib/python2.7/site-packages/alntools-0.1.1-py2.7.egg/alntools/methods.py", line 33, in bam2ec
    bam_utils.convert(bam_filename, ec_filename, None, num_chunks=chunks, number_processes=number_processes, temp_dir=directory, range_filename=range_filename, sample=sample, target_filename=target_filename)
  File "/storage/coda1/p-apaaby3/0/abell65/software/anaconda2.1.0/envs/alntools/lib/python2.7/site-packages/alntools-0.1.1-py2.7.egg/alntools/bam_utils.py", line 828, in convert
    read_names=ec_ids.astype(str),

Fix: I simply modified line 828 of the bam_utils.py script to make the ec_ids a numpy array so that the line would work: read_names=np.array(ec_ids).astype(str),

After this change, bam2ec completes:

[alntools] [12/02/2020 12:33:05 PM] Sample not supplied, using filename: AP01-88_alignedN2ws263_JU1088.bam
[alntools] [12/02/2020 12:33:05 PM] Parsing file information ...
[alntools] [12/02/2020 12:33:06 PM] File parsed in 00:00:00.92, total time: 00:00:00.92
[alntools] [12/02/2020 12:33:06 PM] Calculating 1 chunks
[alntools] [12/02/2020 12:33:06 PM] 1 chunks calculated in 00:00:00.03, total time: 00:00:00.94
[alntools] [12/02/2020 12:33:06 PM] Starting 1 processes ...
[alntools] [12/02/2020 12:34:03 PM] DONE Process ID: 0, File: /storage/scratch1/2/abell65/testemase/work/d3/03f099aa022605de44db6fc7a20a38/_bam2ec.0.bam, 19,626,295 valid alignments processed out of 19,658,429, with 28,333 equivalence classes
[alntools] [12/02/2020 12:34:06 PM] Process 1 done out of 1, combining result
[alntools] [12/02/2020 12:34:06 PM] All results combined in 00:01:00.03, total time: 00:01:00.97
[alntools] [12/02/2020 12:34:06 PM] # Valid Alignments: 19,626,295
[alntools] [12/02/2020 12:34:06 PM] # Main Targets: 183,501
[alntools] [12/02/2020 12:34:06 PM] # Haplotypes: 2
[alntools] [12/02/2020 12:34:06 PM] # Equivalence Classes: 28,333
[alntools] [12/02/2020 12:34:06 PM] # Unique Reads: 3,224,924
[alntools] [12/02/2020 12:34:06 PM] Constructing temp APM structure...
[alntools] [12/02/2020 12:34:07 PM] APM Created in 00:00:00.95, total time: 00:01:01.92
[alntools] [12/02/2020 12:34:07 PM] Matrix created in 00:00:00.08, total time: 00:01:02.00
[alntools] [12/02/2020 12:34:07 PM] Generating BIN file...
[alntools] [12/02/2020 12:34:07 PM] FORMAT: 2
[alntools] [12/02/2020 12:34:07 PM] NUMBER OF HAPLOTYPES: 2
[alntools] [12/02/2020 12:34:07 PM] NUMBER OF TARGETS: 183,501
[alntools] [12/02/2020 12:34:07 PM] FILTERED CRS: 1
[alntools] [12/02/2020 12:34:07 PM] Determining mappings...
[alntools] [12/02/2020 12:34:07 PM] A MATRIX: INDPTR LENGTH 28,334
[alntools] [12/02/2020 12:34:07 PM] A MATRIX: NUMBER OF NON ZERO: 115,786
[alntools] [12/02/2020 12:34:07 PM] A MATRIX: LENGTH INDPTR: 28,334
[alntools] [12/02/2020 12:34:07 PM] A MATRIX: LENGTH INDICES: 115,786
[alntools] [12/02/2020 12:34:07 PM] A MATRIX: LENGTH DATA: 115,786
[alntools] [12/02/2020 12:34:07 PM] N MATRIX: NUMBER OF EQUIVALENCE CLASSES: 28,333
[alntools] [12/02/2020 12:34:07 PM] N MATRIX: LENGTH INDPTR: 2
[alntools] [12/02/2020 12:34:07 PM] N MATRIX: NUMBER OF NON ZERO: 28,333
[alntools] [12/02/2020 12:34:07 PM] N MATRIX: LENGTH INDPTR: 2
[alntools] [12/02/2020 12:34:07 PM] N MATRIX: LENGTH INDICES: 28,333
[alntools] [12/02/2020 12:34:07 PM] N MATRIX: LENGTH DATA: 28,333
[alntools] [12/02/2020 12:34:07 PM] /storage/scratch1/2/abell65/testemase/work/d3/03f099aa022605de44db6fc7a20a38/AP01-88.bin created in 00:00:00.78, total time: 00:01:02.78

churchill-lab / alntools

bam2ec bam_utils.py array/list issue #7