cram2ont sometimes fails if the byte array being decoded contains string objects:
Traceback (most recent call last):
File "./cram2ont", line 5, in <module>
sys.exit(cram2ont.main())
File "/home/ubuntu/ont2cram/cram2ont.py", line 178, in main
run( args.inputfile, args.outputdir )
File "/home/ubuntu/ont2cram/cram2ont.py", line 171, in run
cram_to_fast5( input_file, output_dir )
File "/home/ubuntu/ont2cram/cram2ont.py", line 137, in cram_to_fast5
dtype=[(col_name, a.type)]
TypeError: a bytes-like object is required, not 'str'
In this case the relevant code is:
if col_name=="noname":
#print(f"path={a.path}, val={tag_val[:11]}")
dset.append(tag_val)
else:
dset.append(
np.array(
list(tag_val.split('\x03')) if a.type.startswith(('S','U')) else tag_val,
dtype=[(col_name, a.type)]
)
)
The a.type is 'S5' so it does the list(tag_val.split()) bit, but this results in a list of strings:
It appears the np.array object requires bytes, ie a list like [b'CTTTC', b'GTTTC' ...]. This patch appears to solve this as h5dump now contains the same data in the Events column as before, but I'm rather thrashing in the dark when it comes to Python.
The input data here was test_data/single-read-1/line_br.fast5. Without this change a round-trip fails with the error above.
diff --git a/cram2ont.py b/cram2ont.py
index 0acf964..f5c5092 100755
--- a/cram2ont.py
+++ b/cram2ont.py
@@ -131,12 +131,13 @@ def cram_to_fast5(cram_filename, output_dir):
#print(f"path={a.path}, val={tag_val[:11]}")
dset.append(tag_val)
else:
- dset.append(
- np.array(
- list(tag_val.split('\x03')) if a.type.startswith(('S','U')) else
tag_val,
- dtype=[(col_name, a.type)]
- )
- )
+ if a.type.startswith(('S', 'U')):
+ tag_split=tag_val.split('\x03')
+ for i, x in enumerate(tag_split):
+ tag_split[i]=x.encode('utf-8')
+ dset.append(np.array(tag_split, dtype=[(col_name, a.type)]))
+ else:
+ dset.append(np.array(tag_val, dtype=[(col_name, a.type)]))
for dset_name,columns in DSETS.items():
d = columns[0] if len(columns)==1 else rfn.merge_arrays(columns, flatten=True, u
semask=False)
f.create_dataset( dset_name, data=d )
cram2ont sometimes fails if the byte array being decoded contains string objects:
In this case the relevant code is:
The a.type is 'S5' so it does the
list(tag_val.split())
bit, but this results in a list of strings:It appears the
np.array
object requires bytes, ie a list like[b'CTTTC', b'GTTTC' ...]
. This patch appears to solve this as h5dump now contains the same data in the Events column as before, but I'm rather thrashing in the dark when it comes to Python.The input data here was test_data/single-read-1/line_br.fast5. Without this change a round-trip fails with the error above.