Closed Jigyasa3 closed 7 months ago
Hi @Jigyasa3,
There actually isn't a constructor for reading FASTA files at the moment, only for GBK files. The sequence
parameter in the from_string
constructor should already be the DNA string, not a path to a file.
I'll have a look at including a FASTA constructor in the future! But for now there isn't one, sorry.
Hi @ZoyavanMeel ,
Thank you so much for replying back to all my queries!
I will use the ORCA.from_gbk()
for my samples!
BTW, I was comparing the ORCA output for NC_000913 Treponema bacteria with that of doriC, but getting slightly different results. Any suggestions?
>>> orca = ORCA.from_accession("NC_000919",email=email)
>>> orca.find_oriCs(show_info=True)
Yea, so DoriC (Ori-Finder) uses only intergenic locations in its final analysis. That is how it's able to give an actual range, rather than just a single point. The exact details of how it does this are unclear to me. Largely because it is closed source and a web service, I made ORCA as an alternative open-source oriC prediction tool. ORCA considers it possible for the oriC to be anywhere on the sequence and will be slightly different because of it. An application note on ORCA by my supervisor and I should be out on biorxiv in a few days. I can send you the link then :)
Here is the application note: https://doi.org/10.1101/2024.03.28.587133 Let me know if you have any more questions
Dear @Mister-Teapot and @ZoyavanMeel ,
Thank you for a great tool to calculate z-curves! I am getting an error if I use the
ORCA.from_string()
function. Any suggestions on how to load the sequence fasta file as string for ORCA to recognize it? TheNC_000919_Treponema_test.fasta
that I am using is a multi-line string of nt. sequences (with no header ">" line). I also tried with converting this file to a single-line string, but I get the same error. Any suggestions?Code used-
>>>orca=ORCA.from_string("/groups/rubin/projects/jigyasa/ML/results/intergenicregion_find/oriV_annotation/GCskew_plasmidfinder/test/NC_000919_Treponema_test.fasta")
>>>orca.find_oriCs(show_info=True)
Error-
Traceback (most recent call last): File "", line 1, in
File "/home/jigyasaa/downloads/ORCA/build/lib/orcapy/ORCA.py", line 573, in find_oriCs
peaks_of_interest = self.analyse_disparity_curves()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jigyasaa/downloads/ORCA/build/lib/orcapy/ORCA.py", line 370, in analyse_disparity_curves
peaks_x = CurveProcessing.process_curve(self.x, 'min', window_size=window_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jigyasaa/miniconda3/envs/orca_conda/lib/python3.11/site-packages/orcapy-1.0.1-py3.11.egg/orcapy/CurveProcessing.py", line 14, in process_curve
accepted_peaks = filter_peaks(curve, mode, init_peaks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jigyasaa/miniconda3/envs/orca_conda/lib/python3.11/site-packages/orcapy-1.0.1-py3.11.egg/orcapy/CurveProcessing.py", line 57, in filter_peaks
rejected_peaks.extend(_filter_within_windows(curve, mode, peaks))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jigyasaa/miniconda3/envs/orca_conda/lib/python3.11/site-packages/orcapy-1.0.1-py3.11.egg/orcapy/CurveProcessing.py", line 79, in _filter_within_windows
elif mode == 'min' and comparator_win.min() < curve[peak.middle]:
^^^^^^^^^^^^^^^^^^^^
File "/home/jigyasaa/miniconda3/envs/orca_conda/lib/python3.11/site-packages/numpy/core/_methods.py", line 45, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial, where)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zero-size array to reduction operation minimum which has no identity