FofanovLab / MTSv

Metagenomic Analysis
MIT License
12 stars 2 forks source link

MTSv-summary bug #4

Closed vyfofanov closed 6 years ago

vyfofanov commented 6 years ago

There is a potential bug in the MTSv-summary pipeline that causes an unexpected crash. The bug is reproducible.

here is the command line: srun --mem=64000 python /home/vyf2/MTSv/scripts/MTSv_summary.py --threads 1 -o /scratch/vyf2/CR2/metaSlava/YP/ YERPE_Yp4027 /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.clp /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.sig &

The taxdump I've used is here (but the newer one results in same issue): /scratch/vyf2/NCBI_012417/treeBuild/taxdump.tar.gz

The error message produced is (int overflow?): Traceback (most recent call last): File "/home/vyf2/MTSv/scripts/MTSv_summary.py", line 241, in get_summary(ARGS.all, ARGS.sig, outfile, ARGS.threads, ARGS.verbose) File "/home/vyf2/MTSv/scripts/MTSv_summary.py", line 148, in get_summary data_dict = parse_all_hits(all_file, data_dict, sig_reads, threads, verbose) File "/home/vyf2/MTSv/scripts/MTSv_summary.py", line 140, in parse_all_hits all_file, sep=":", header=None, chunksize=n_rows//threads)) File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/pool.py", line 260, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/pool.py", line 608, in get raise self._value File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks put(task) File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/connection.py", line 206, in send self._send_bytes(ForkingPickler.dumps(obj)) File "/home/vyf2/.conda/envs/biopy3/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647

The issue seems to be squarely in the data from collapsed file /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.clp

Removing the .clp file from the equation removes the issue. For example: srun --mem=64000 python /home/vyf2/MTSv/scripts/MTSv_summary.py --threads 1 -o /scratch/vyf2/CR2/metaSlava/YP/ YERPE_Yp4027 /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.sig /scratch/vyf2/CR2/metaSlava/YP/vedro/YERPE_Yp4027.sig &

the above command will work fine

tfursten commented 6 years ago

This should be fixed in the latest version