PoonLab / OpenRDP

An open-source re-implementation of the RDP4 recombination detection program
GNU General Public License v3.0
45 stars 9 forks source link

User interface improvements #38

Closed ArtPoon closed 1 year ago

ArtPoon commented 1 year ago

I'm not crazy about the current command-line invocation:

python3 -m openrdp ./openrdp/tests/test_neisseria.fasta ./test.csv -cfg ./openrdp/tests/test_cfg.ini -all
ArtPoon commented 1 year ago

The current Python interface is also not ideal:

>>> from openrdp.main import openrdp
>>> res = openrdp("openrdp/tests/test_neisseria.fasta", "test.csv", "openrdp/tests/test_cfg.ini")
Starting 3Seq Analysis
...
Siscan               2                    45                   X64866               X64869               X64873               0.7654518134939761  
>>> res
>>> 

openrdp should return some object that makes the outputs available to the user for downstream processing.

ArtPoon commented 1 year ago

Python interface should not depend on an external file, e.g., test_cfg.ini

ArtPoon commented 1 year ago

Started ripping things out in new branch interface

ArtPoon commented 1 year ago

Moved command-line interface code to executable script under new /bin folder.

ArtPoon commented 1 year ago

Moved core code from main.py and run_scans.py into openrdp/__init__.py

ArtPoon commented 1 year ago
ArtPoon commented 1 year ago

Of course refactoring broke stuff:

art@Wernstrom OpenRDP % openrdp tests/test_neisseria.fasta test
Starting 3Seq Analysis
[Errno 2] No such file or directory: '/Users/art/git/OpenRDP/test_neisseria.fasta.3s.rec'
Finished 3Seq Analysis
Starting GENECONV Analysis
Finished GENECONV Analysis
Setting up geneconv analysis...
Setting up bootscan analysis...
Starting Scanning Phase of Bootscan/Recscan
Traceback (most recent call last):
  File "/usr/local/bin/openrdp", line 4, in <module>
    __import__('pkg_resources').run_script('OpenRDP==0.0.1', 'openrdp')
  File "/usr/local/lib/python3.10/site-packages/pkg_resources/__init__.py", line 672, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1472, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.10/site-packages/OpenRDP-0.0.1-py3.10.egg/EGG-INFO/scripts/openrdp", line 30, in <module>
    results = openrdp.openrdp(args.infile, args.outfile, cfg=args.cfg,
  File "/usr/local/lib/python3.10/site-packages/OpenRDP-0.0.1-py3.10.egg/openrdp/__init__.py", line 293, in openrdp
    results = scanner.run_scans(aln)
  File "/usr/local/lib/python3.10/site-packages/OpenRDP-0.0.1-py3.10.egg/openrdp/__init__.py", line 174, in run_scans
    tmethods.append(a['method'](alignment, settings=settings, quiet=self.quiet))
  File "/usr/local/lib/python3.10/site-packages/OpenRDP-0.0.1-py3.10.egg/openrdp/bootscan.py", line 33, in __init__
    self.dists = self.do_scanning_phase(alignment)
  File "/usr/local/lib/python3.10/site-packages/OpenRDP-0.0.1-py3.10.egg/openrdp/bootscan.py", line 128, in do_scanning_phase
    with multiprocessing.Pool() as p:
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 215, in __init__
    self._repopulate_pool()
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 306, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 329, in _repopulate_pool_static
    w.start()
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 183, in get_preparation_data
    main_mod_name = getattr(main_module.__spec__, "name", None)
AttributeError: module '__main__' has no attribute '__spec__'
ArtPoon commented 1 year ago

The first error:

Starting 3Seq Analysis
[Errno 2] No such file or directory: '/Users/art/git/OpenRDP/test_neisseria.fasta.3s.rec'

is probably associated with these lines: https://github.com/PoonLab/OpenRDP/blob/1863aca41432151c6121ddf336ebd02408ed0c74/openrdp/threeseq.py#L60-L63

ArtPoon commented 1 year ago

I confirmed that master branch runs fine in this setup. Very strange. Ah, but the "file not found" error is still there:

Starting 3Seq Analysis
[Errno 2] No such file or directory: '/Users/art/git/OpenRDP/test_neisseria.fasta.3s.rec'
Finished 3Seq Analysis
ArtPoon commented 1 year ago

I'm going to spin these two errors off into separate issues.

ArtPoon commented 1 year ago
ArtPoon commented 1 year ago

Successful run from interface branch on Linux (Ubuntu 20):

(venv) art@orolo:~/git/OpenRDP$ openrdp tests/test_neisseria.fasta test.out -c tests/test_cfg.ini 
Starting 3Seq Analysis
Finished 3Seq Analysis
Starting GENECONV Analysis
Finished GENECONV Analysis
Setting up bootscan analysis...
Starting Scanning Phase of Bootscan/Recscan
Finished Scanning Phase of Bootscan/Recscan
Setting up maxchi analysis...
Setting up siscan analysis...
Setting up chimaera analysis...
Setting up rdp analysis...
Scanning triplet 1 / 4
Scanning triplet 2 / 4
Scanning triplet 3 / 4
Scanning triplet 4 / 4

Method               StartLocation        EndLocation          Recombinant          Parent1              Parent2              Pvalue              
Geneconv             1                    204                  X64866               X64869               -                    0.00002             
Geneconv             151                  195                  X64860               X64869               -                    0.00210             
Geneconv             203                  507                  X64860               X64866               -                    0.00829             
Geneconv             539                  759                  X64860               X64866               -                    0.15378             
Geneconv             151                  193                  X64873               -                    -                    0.02202             
Geneconv             56                   170                  X64860               -                    -                    0.02728             
Bootscan             760                  765                  X64869               X64860               X64866               0.06513627245570731 
MaxChi               475                  518                  X64860               X64866               X64869               0.04042768199451279 
MaxChi               439                  482                  X64860               X64866               X64873               0.04042768199451279 
MaxChi               475                  518                  X64866               X64869               X64873               0.04042768199451279 
Siscan               2                    45                   X64860               X64866               X64869               0.7663004734577327  
Siscan               2                    45                   X64860               X64866               X64873               0.7593011319150192  
Siscan               2                    45                   X64860               X64869               X64873               0.7629651782860931  
Siscan               2                    45                   X64866               X64869               X64873               0.7624489264586414  
Chimaera             179                  265                  X64860               X64869               X64873               0.004701217146256585
Chimaera             170                  213                  X64869               X64866               X64873               0.0018132288986577026
Chimaera             177                  220                  X64873               X64860               X64866               0.02047438504938101 
3Seq                 202                  787                  X64869               X64860               X64866               5.982096e-10        
3Seq                 181                  787                  X64866               X64869               X64873               5.294757e-06        
RDP                  6                    496                  X64860               X64866               X64869               6.450462737744835e-06
RDP                  6                    504                  X64860               X64866               X64873               0.002454937279797601
RDP                  36                   481                  X64860               X64869               X64873               0.00044032555125241474