Open pathogen-detection opened 2 months ago
@GopiGugan can you investigate this please?
We will probably be deprecating multiprocessing
in favour of mpi4py
to make bootscan consistent with the other methods, see #85
I tried running openrdp from branch iss85 without multiprocessing and got a different indexing error
IndexError: index 3 is out of bounds for axis 0 with size 3
I tried running openrdp from branch iss85 without multiprocessing and got a different indexing error
IndexError: index 3 is out of bounds for axis 0 with size 3
Changes were made in bootscan.py
when adding reference sequences as an option - https://github.com/PoonLab/OpenRDP/commit/a9949f624b64da2f4779be12fdfd0a85e5657339
Looks like there may be an indexing issue. I will review why we have a multi-dimensional array in this case
Fix in progress
@GopiGugan to create a PR with this fix to merge into dev
Error content: openrdp -c default.ini query-sequences.fasta -r refer-sequences.fasta /home/xxx/.local/bin/openrdp:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import('pkg_resources').run_script('OpenRDP==0.1.0', 'openrdp') Loading configuration from default.ini Starting 3Seq Analysis Finished 3Seq Analysis Starting GENECONV Analysis Finished GENECONV Analysis Setting up bootscan analysis... Starting Scanning Phase of Bootscan/Recscan Finished Scanning Phase of Bootscan/Recscan Setting up maxchi analysis... Setting up siscan analysis... Setting up chimaera analysis... Setting up rdp analysis... Scanning triplet 1 / 2664 Scanning triplet 2 / 2664 Scanning triplet 3 / 2664 Scanning triplet 334 / 2664 /home/xxx/.local/lib/python3.8/site-packages/OpenRDP-0.1.0-py3.8.egg/openrdp/bootscan.py:277: RuntimeWarning: divide by zero encountered in log (log_n_fact - (log_i_fact + log_ni_fact)) + np.log(p n) + np.log((1 - p) (n - i))) Scanning triplet 4 / 2664 Scanning triplet 667 / 2664 Scanning triplet 5 / 2664 Scanning triplet 668 / 2664 Scanning triplet 6 / 2664 Scanning triplet 669 / 2664 Scanning triplet 7 / 2664 Scanning triplet 670 / 2664 Scanning triplet 1000 / 2664 Scanning triplet 671 / 2664 Scanning triplet 1333 / 2664 Scanning triplet 1666 / 2664 Scanning triplet 1334 / 2664 Scanning triplet 1999 / 2664 Scanning triplet 1335 / 2664 Scanning triplet 1336 / 2664 Scanning triplet 2000 / 2664 Scanning triplet 2001 / 2664 Scanning triplet 2332 / 2664 Scanning triplet 2002 / 2664 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/xxx/.local/lib/python3.8/site-packages/OpenRDP-0.1.0-py3.8.egg/openrdp/bootscan.py", line 194, in execute ab_dist = dist_mat[int(triplet.idxs[0][0] * (self.align.shape[0] - 1) - IndexError: index 9 is out of bounds for axis 0 with size 6 """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/xxx/.local/bin/openrdp", line 4, in
import('pkg_resources').run_script('OpenRDP==0.1.0', 'openrdp')
File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/site-packages/pkg_resources/init.py", line 722, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/site-packages/pkg_resources/init.py", line 1561, in run_script
exec(code, namespace, namespace)
File "/home/xxx/.local/lib/python3.8/site-packages/OpenRDP-0.1.0-py3.8.egg/EGG-INFO/scripts/openrdp", line 44, in
results = scanner.run_scans(args.infile, args.ref)
File "/home/xxx/.local/lib/python3.8/site-packages/OpenRDP-0.1.0-py3.8.egg/openrdp/init.py", line 265, in run_scans
bootscan.execute_all(total_combinations=total_num_trps, seq_names=self.seq_names,
File "/home/xxx/.local/lib/python3.8/site-packages/OpenRDP-0.1.0-py3.8.egg/openrdp/bootscan.py", line 295, in execute_all
results = p.map(self.execute, enumerate(TripletGenerator(self.align, self.seq_names,
File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
IndexError: index 9 is out of bounds for axis 0 with size 6
I used the query sequences and refer sequences as the blow: query-sequences.txt refer-sequences.txt I paste them separately into two folders, after aligning the query sequence and the reference sequence together