Closed ArtPoon closed 1 year ago
The current Python interface is also not ideal:
>>> from openrdp.main import openrdp
>>> res = openrdp("openrdp/tests/test_neisseria.fasta", "test.csv", "openrdp/tests/test_cfg.ini")
Starting 3Seq Analysis
...
Siscan 2 45 X64866 X64869 X64873 0.7654518134939761
>>> res
>>>
openrdp
should return some object that makes the outputs available to the user for downstream processing.
Python interface should not depend on an external file, e.g., test_cfg.ini
Started ripping things out in new branch interface
Moved command-line interface code to executable script under new /bin
folder.
Moved core code from main.py
and run_scans.py
into openrdp/__init__.py
default_config.ini
into default settings, i.e., user does not have to explicitly specify these.Scanner
object.Scanner
.Of course refactoring broke stuff:
art@Wernstrom OpenRDP % openrdp tests/test_neisseria.fasta test
Starting 3Seq Analysis
[Errno 2] No such file or directory: '/Users/art/git/OpenRDP/test_neisseria.fasta.3s.rec'
Finished 3Seq Analysis
Starting GENECONV Analysis
Finished GENECONV Analysis
Setting up geneconv analysis...
Setting up bootscan analysis...
Starting Scanning Phase of Bootscan/Recscan
Traceback (most recent call last):
File "/usr/local/bin/openrdp", line 4, in <module>
__import__('pkg_resources').run_script('OpenRDP==0.0.1', 'openrdp')
File "/usr/local/lib/python3.10/site-packages/pkg_resources/__init__.py", line 672, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1472, in run_script
exec(code, namespace, namespace)
File "/usr/local/lib/python3.10/site-packages/OpenRDP-0.0.1-py3.10.egg/EGG-INFO/scripts/openrdp", line 30, in <module>
results = openrdp.openrdp(args.infile, args.outfile, cfg=args.cfg,
File "/usr/local/lib/python3.10/site-packages/OpenRDP-0.0.1-py3.10.egg/openrdp/__init__.py", line 293, in openrdp
results = scanner.run_scans(aln)
File "/usr/local/lib/python3.10/site-packages/OpenRDP-0.0.1-py3.10.egg/openrdp/__init__.py", line 174, in run_scans
tmethods.append(a['method'](alignment, settings=settings, quiet=self.quiet))
File "/usr/local/lib/python3.10/site-packages/OpenRDP-0.0.1-py3.10.egg/openrdp/bootscan.py", line 33, in __init__
self.dists = self.do_scanning_phase(alignment)
File "/usr/local/lib/python3.10/site-packages/OpenRDP-0.0.1-py3.10.egg/openrdp/bootscan.py", line 128, in do_scanning_phase
with multiprocessing.Pool() as p:
File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 215, in __init__
self._repopulate_pool()
File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 306, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 329, in _repopulate_pool_static
w.start()
File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 183, in get_preparation_data
main_mod_name = getattr(main_module.__spec__, "name", None)
AttributeError: module '__main__' has no attribute '__spec__'
The first error:
Starting 3Seq Analysis
[Errno 2] No such file or directory: '/Users/art/git/OpenRDP/test_neisseria.fasta.3s.rec'
is probably associated with these lines: https://github.com/PoonLab/OpenRDP/blob/1863aca41432151c6121ddf336ebd02408ed0c74/openrdp/threeseq.py#L60-L63
I confirmed that master
branch runs fine in this setup. Very strange.
Ah, but the "file not found" error is still there:
Starting 3Seq Analysis
[Errno 2] No such file or directory: '/Users/art/git/OpenRDP/test_neisseria.fasta.3s.rec'
Finished 3Seq Analysis
I'm going to spin these two errors off into separate issues.
dev
branch and merged into interface
Successful run from interface
branch on Linux (Ubuntu 20):
(venv) art@orolo:~/git/OpenRDP$ openrdp tests/test_neisseria.fasta test.out -c tests/test_cfg.ini
Starting 3Seq Analysis
Finished 3Seq Analysis
Starting GENECONV Analysis
Finished GENECONV Analysis
Setting up bootscan analysis...
Starting Scanning Phase of Bootscan/Recscan
Finished Scanning Phase of Bootscan/Recscan
Setting up maxchi analysis...
Setting up siscan analysis...
Setting up chimaera analysis...
Setting up rdp analysis...
Scanning triplet 1 / 4
Scanning triplet 2 / 4
Scanning triplet 3 / 4
Scanning triplet 4 / 4
Method StartLocation EndLocation Recombinant Parent1 Parent2 Pvalue
Geneconv 1 204 X64866 X64869 - 0.00002
Geneconv 151 195 X64860 X64869 - 0.00210
Geneconv 203 507 X64860 X64866 - 0.00829
Geneconv 539 759 X64860 X64866 - 0.15378
Geneconv 151 193 X64873 - - 0.02202
Geneconv 56 170 X64860 - - 0.02728
Bootscan 760 765 X64869 X64860 X64866 0.06513627245570731
MaxChi 475 518 X64860 X64866 X64869 0.04042768199451279
MaxChi 439 482 X64860 X64866 X64873 0.04042768199451279
MaxChi 475 518 X64866 X64869 X64873 0.04042768199451279
Siscan 2 45 X64860 X64866 X64869 0.7663004734577327
Siscan 2 45 X64860 X64866 X64873 0.7593011319150192
Siscan 2 45 X64860 X64869 X64873 0.7629651782860931
Siscan 2 45 X64866 X64869 X64873 0.7624489264586414
Chimaera 179 265 X64860 X64869 X64873 0.004701217146256585
Chimaera 170 213 X64869 X64866 X64873 0.0018132288986577026
Chimaera 177 220 X64873 X64860 X64866 0.02047438504938101
3Seq 202 787 X64869 X64860 X64866 5.982096e-10
3Seq 181 787 X64866 X64869 X64873 5.294757e-06
RDP 6 496 X64860 X64866 X64869 6.450462737744835e-06
RDP 6 504 X64860 X64866 X64873 0.002454937279797601
RDP 36 481 X64860 X64869 X64873 0.00044032555125241474
I'm not crazy about the current command-line invocation:
openrdp
as an executable script in/usr/local/bin
or/home/USER/.local/bin
-cfg
flag.-cfg
proceeds to run the analysis until it throws an exception:-cfg
is listed as an optional argument in the help text, so it should behave as such!