PoonLab / OpenRDP

An open-source re-implementation of the RDP4 recombination detection program
GNU General Public License v3.0
45 stars 9 forks source link

Some input files cause FileNotFoundError exception for 3Seq #76

Closed wdenggithub closed 3 months ago

wdenggithub commented 3 months ago

Hi Art,

I have downloaded you program. But when I tried to run it I got the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/mk/ndp7_qps50z1141rj706f3jc0000gn/T/tmp6vl6npoi.3s.rec.csv'

I checked the directory. the other related files are there (like .log .pvalHist) but .rec.csv file. Could you please help me?

Thanks,

Wenjie

ArtPoon commented 3 months ago

Hm that's strange. Are you able to copy-paste any console messages to this issue? Also please provide your operating system, version of Python and version of OpenRDP.

wdenggithub commented 3 months ago

Thanks for the prompt reply. I think it's the latest version of openRDP (commit 1d1c29969c5382f3c249ad221c8bee9f9e0bcab8). my computer mac, python 3.9. Following is full message I got:

(base) Wenjies-MacBook-Pro:V704_0128 wdeng$ openrdp V704_0128_220-221_GP_cluster3.fasta -o openrdp_out.txt Loading configuration from /Users/wdeng/opt/anaconda3/lib/python3.9/site-packages/OpenRDP-0.1.0-py3.9.egg/openrdp/default.ini Starting 3Seq Analysis Traceback (most recent call last): File "/Users/wdeng/opt/anaconda3/bin/openrdp", line 4, in import('pkg_resources').run_script('OpenRDP==0.1.0', 'openrdp') File "/Users/wdeng/opt/anaconda3/lib/python3.9/site-packages/pkg_resources/init.py", line 662, in run_script self.require(requires)[0].run_script(script_name, ns) File "/Users/wdeng/opt/anaconda3/lib/python3.9/site-packages/pkg_resources/init.py", line 1459, in run_script exec(code, namespace, namespace) File "/Users/wdeng/opt/anaconda3/lib/python3.9/site-packages/OpenRDP-0.1.0-py3.9.egg/EGG-INFO/scripts/openrdp", line 44, in results = scanner.run_scans(args.infile, args.ref) File "/Users/wdeng/opt/anaconda3/lib/python3.9/site-packages/OpenRDP-0.1.0-py3.9.egg/openrdp/init.py", line 219, in run_scans results.dict['threeseq'] = three_seq.execute() File "/Users/wdeng/opt/anaconda3/lib/python3.9/site-packages/OpenRDP-0.1.0-py3.9.egg/openrdp/threeseq.py", line 51, in execute ts_results = self.parse_output(out_path) File "/Users/wdeng/opt/anaconda3/lib/python3.9/site-packages/OpenRDP-0.1.0-py3.9.egg/openrdp/threeseq.py", line 65, in parse_output with open(out_path) as out_handle: FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/mk/ndp7_qps50z1141rj706f3jc0000gn/T/tmp6vl6npoi.3s.rec.csv'

Thanks,

Wenjie

ArtPoon commented 3 months ago

I have not been able to reproduce this error so far.

wdenggithub commented 3 months ago

Thanks Art. I have run the test data in your tests directory. Both CRF_07_test.fasta and long.fasta gave me the results. But When I tried on CRF_07_ref.fasta, long_ref.fasta and short.fasta, I got the same error message. Does it mean there is no recombination events among the input sequences? My other question is how to interpret the output results? Does it include the results from different methods? The lines starting with "RDP" are the results of your OpenRDP program? Thanks.

ArtPoon commented 3 months ago

Ok thanks I can reproduce your error with these input files. I think this has something to do with the level of sequence divergence in the inputs.
Here is long.fasta, which runs as expected:

Screenshot 2024-03-15 at 5 48 37 PM

and here is long_ref.fasta:

Screenshot 2024-03-15 at 5 49 41 PM

In general, the input files that fail contain sequences with no discernable homology. Note that several of these test files are set up this way on purpose to trigger some filter in the code, for example. So to start, I would suggest examining your input FASTA to make sure that the sequences are homologous and correctly aligned.

ArtPoon commented 3 months ago

To answer your other question, the results are listing predicted breakpoints and their statistical significance for the different recombination detection methods. Note that "RDP" is both a collection of these methods as well as the name of one of these methods. Also note that OpenRDP is a re-implementation of the RDP collection in Python as an open source project. It is probably better to use the original RDP for research purposes, since OpenRDP is an unpublished (and thus not yet peer reviewed) project. (Of course, that hasn't stopped a couple of studies from exploiting our open source code to use as a convenient punching bag.)

ArtPoon commented 3 months ago

Okay it looks like the problem is that 3Seq is not finding any significant recombinant triplets and consequently is not writing any lines to the file with extension .3s.rec.csv. In fact, the file itself is apparently not even generated. We can deal with this by handling the FileNotFoundException accordingly.

ArtPoon commented 3 months ago

Ok please try pulling from the dev branch and see it resolves the problem, e.g.,

git clone https://github.com/PoonLab/OpenRDP
cd OpenRDP
git fetch
git checkout dev
python3 setup.py install
openrdp <input FASTA>
ArtPoon commented 3 months ago

Sorry I messed up that patch, please use commit a79701bd1193779e5e9b1d73a0e570a0eaccb373

wdenggithub commented 3 months ago

Thank you so much Art! I'll check it out.