There is a potential bug in the MTSv-summary pipeline that causes an unexpected crash. The bug is reproducible.
here is the command line:
srun --mem=64000 python /home/vyf2/MTSv/scripts/MTSv_summary.py --threads 1 -o /scratch/vyf2/CR2/metaSlava/YP/ YERPE_Yp4027 /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.clp /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.sig &
The taxdump I've used is here (but the newer one results in same issue):
/scratch/vyf2/NCBI_012417/treeBuild/taxdump.tar.gz
The error message produced is (int overflow?):
Traceback (most recent call last):
File "/home/vyf2/MTSv/scripts/MTSv_summary.py", line 241, in
get_summary(ARGS.all, ARGS.sig, outfile, ARGS.threads, ARGS.verbose)
File "/home/vyf2/MTSv/scripts/MTSv_summary.py", line 148, in get_summary
data_dict = parse_all_hits(all_file, data_dict, sig_reads, threads, verbose)
File "/home/vyf2/MTSv/scripts/MTSv_summary.py", line 140, in parse_all_hits
all_file, sep=":", header=None, chunksize=n_rows//threads))
File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
The issue seems to be squarely in the data from collapsed file /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.clp
Removing the .clp file from the equation removes the issue. For example:
srun --mem=64000 python /home/vyf2/MTSv/scripts/MTSv_summary.py --threads 1 -o /scratch/vyf2/CR2/metaSlava/YP/ YERPE_Yp4027 /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.sig /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.sig &
There is a potential bug in the MTSv-summary pipeline that causes an unexpected crash. The bug is reproducible.
here is the command line: srun --mem=64000 python /home/vyf2/MTSv/scripts/MTSv_summary.py --threads 1 -o /scratch/vyf2/CR2/metaSlava/YP/ YERPE_Yp4027 /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.clp /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.sig &
The taxdump I've used is here (but the newer one results in same issue): /scratch/vyf2/NCBI_012417/treeBuild/taxdump.tar.gz
The error message produced is (int overflow?): Traceback (most recent call last): File "/home/vyf2/MTSv/scripts/MTSv_summary.py", line 241, in
get_summary(ARGS.all, ARGS.sig, outfile, ARGS.threads, ARGS.verbose)
File "/home/vyf2/MTSv/scripts/MTSv_summary.py", line 148, in get_summary
data_dict = parse_all_hits(all_file, data_dict, sig_reads, threads, verbose)
File "/home/vyf2/MTSv/scripts/MTSv_summary.py", line 140, in parse_all_hits
all_file, sep=":", header=None, chunksize=n_rows//threads))
File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
The issue seems to be squarely in the data from collapsed file /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.clp
Removing the .clp file from the equation removes the issue. For example: srun --mem=64000 python /home/vyf2/MTSv/scripts/MTSv_summary.py --threads 1 -o /scratch/vyf2/CR2/metaSlava/YP/ YERPE_Yp4027 /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.sig /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.sig &
the above command will work fine