PoonLab / OpenRDP

An open-source re-implementation of the RDP4 recombination detection program
GNU General Public License v3.0
45 stars 9 forks source link

Indexing issue when passing in a reference sequence (Bootscan) #88

Open pathogen-detection opened 2 months ago

pathogen-detection commented 2 months ago

Error content: openrdp -c default.ini query-sequences.fasta -r refer-sequences.fasta /home/xxx/.local/bin/openrdp:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import('pkg_resources').run_script('OpenRDP==0.1.0', 'openrdp') Loading configuration from default.ini Starting 3Seq Analysis Finished 3Seq Analysis Starting GENECONV Analysis Finished GENECONV Analysis Setting up bootscan analysis... Starting Scanning Phase of Bootscan/Recscan Finished Scanning Phase of Bootscan/Recscan Setting up maxchi analysis... Setting up siscan analysis... Setting up chimaera analysis... Setting up rdp analysis... Scanning triplet 1 / 2664 Scanning triplet 2 / 2664 Scanning triplet 3 / 2664 Scanning triplet 334 / 2664 /home/xxx/.local/lib/python3.8/site-packages/OpenRDP-0.1.0-py3.8.egg/openrdp/bootscan.py:277: RuntimeWarning: divide by zero encountered in log (log_n_fact - (log_i_fact + log_ni_fact)) + np.log(p n) + np.log((1 - p) (n - i))) Scanning triplet 4 / 2664 Scanning triplet 667 / 2664 Scanning triplet 5 / 2664 Scanning triplet 668 / 2664 Scanning triplet 6 / 2664 Scanning triplet 669 / 2664 Scanning triplet 7 / 2664 Scanning triplet 670 / 2664 Scanning triplet 1000 / 2664 Scanning triplet 671 / 2664 Scanning triplet 1333 / 2664 Scanning triplet 1666 / 2664 Scanning triplet 1334 / 2664 Scanning triplet 1999 / 2664 Scanning triplet 1335 / 2664 Scanning triplet 1336 / 2664 Scanning triplet 2000 / 2664 Scanning triplet 2001 / 2664 Scanning triplet 2332 / 2664 Scanning triplet 2002 / 2664 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/xxx/.local/lib/python3.8/site-packages/OpenRDP-0.1.0-py3.8.egg/openrdp/bootscan.py", line 194, in execute ab_dist = dist_mat[int(triplet.idxs[0][0] * (self.align.shape[0] - 1) - IndexError: index 9 is out of bounds for axis 0 with size 6 """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/xxx/.local/bin/openrdp", line 4, in import('pkg_resources').run_script('OpenRDP==0.1.0', 'openrdp') File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/site-packages/pkg_resources/init.py", line 722, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/site-packages/pkg_resources/init.py", line 1561, in run_script exec(code, namespace, namespace) File "/home/xxx/.local/lib/python3.8/site-packages/OpenRDP-0.1.0-py3.8.egg/EGG-INFO/scripts/openrdp", line 44, in results = scanner.run_scans(args.infile, args.ref) File "/home/xxx/.local/lib/python3.8/site-packages/OpenRDP-0.1.0-py3.8.egg/openrdp/init.py", line 265, in run_scans bootscan.execute_all(total_combinations=total_num_trps, seq_names=self.seq_names, File "/home/xxx/.local/lib/python3.8/site-packages/OpenRDP-0.1.0-py3.8.egg/openrdp/bootscan.py", line 295, in execute_all results = p.map(self.execute, enumerate(TripletGenerator(self.align, self.seq_names, File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/xxx/bio_software/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value IndexError: index 9 is out of bounds for axis 0 with size 6

I used the query sequences and refer sequences as the blow: query-sequences.txt refer-sequences.txt I paste them separately into two folders, after aligning the query sequence and the reference sequence together

ArtPoon commented 2 months ago

@GopiGugan can you investigate this please?

ArtPoon commented 1 month ago

We will probably be deprecating multiprocessing in favour of mpi4py to make bootscan consistent with the other methods, see #85

WilliamZekaiWang commented 1 month ago

I tried running openrdp from branch iss85 without multiprocessing and got a different indexing error IndexError: index 3 is out of bounds for axis 0 with size 3

GopiGugan commented 1 month ago

I tried running openrdp from branch iss85 without multiprocessing and got a different indexing error IndexError: index 3 is out of bounds for axis 0 with size 3

Changes were made in bootscan.py when adding reference sequences as an option - https://github.com/PoonLab/OpenRDP/commit/a9949f624b64da2f4779be12fdfd0a85e5657339

Looks like there may be an indexing issue. I will review why we have a multi-dimensional array in this case

ArtPoon commented 1 month ago

Fix in progress

ArtPoon commented 4 weeks ago
ArtPoon commented 3 weeks ago

@GopiGugan to create a PR with this fix to merge into dev