Closed shaniAmare closed 5 years ago
Could you please did a pip freeze and dump the results?
Is this python 2 or 3?
I have some dtw varieties, but it would mean I'd have to drop it into a pip wheel
I'm assuming whether the pierre-rouanet/dtw
work? Adding the following to the script seems to produce something, but I'm not exactly confident whether I'm doing the right thing. If you can comment, if you be great!
Import the modules first...
from dtw import dtw
from scipy.spatial.distance import euclidean
Then replacemlpy.dtw_subsequence
withdtw
as belowdist, cost, acc_cost, path = dtw(adapter[name], sig_search, dist=euclidean)
Could you please did a pip freeze and dump the results?
Is this python 2 or 3?
It is python 2
bash-4.1$ which python
/usr/local/bioinfsoftware/python/python-2.7/bin/python
Ahh yea, there your mlpy version is 0.1.0
The pip wheel is horrendously out of date. Please follow the links to instructions for installing mlpy on the docs page. The bottom of this page https://psy-fer.github.io/SquiggleKitDocs/install/
Specifically this package https://sourceforge.net/projects/mlpy/files/mlpy%203.5.0/ for whichever OS you are using.
The particular version is 3.5.0
Hope that helps.
Hi James,
Here is the new pip freeze after updating mlpy to 3.5.0: pip_freeze_new.txt
Still, I'm getting the
Traceback (most recent call last):
File "SourceFiles/SquiggleKit/MotifSeq.py", line 10, in <module>
from mlpy import dtw_subsequence
ImportError: cannot import name dtw_subsequence
I really appreciate it if you could help me figure out what I'm doing wrong.
Cheers, Shani.
Hello,
the dtw_subsequence method is definately still there with that version.
I downloaded the package again and double checked.
My guess would be there are some residual leftovers from the previous mlpy install from pip.
Try this maybe?
python -c "import mlpy; print(mlpy.__version__)"
James.
Hey, thanks James.
You are correct! The version showed as 0.1.0 when I do python -c "import mlpy; print(mlpy.__version__)"
.
mlpy
from the installation location and reinstalled mlpy using python setup.py --user
> was OK.python -c "import mlpy; print(mlpy.__version__)"
I get this.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "mlpy/__init__.py", line 18, in <module>
import gsl
ImportError: No module named gsl
pip install --user gsl
> was OKmlpy
version I get
ImportError: /home/amarasinghe.s/.local/lib/python2.7/site-packages/mlpy/gsl.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8
Am I using the wrong gsl
version? According to this error, if I understand correctly, I'm using a different Python interpreter for running mlpy
than the one used to compile gsl
.
Did you come across this issue before, how did you fix it?
Cheers, Shani.
ooooh yea I vaguely remember running into this issue on a machine. trying to remember what I did....
I think I compiled it from source? got lunch and some meetings. I'll come back to this later.
Cool - thanks!
Hmm... however though, I'm wondering whether the same result for MotifSeq.py
can be achieved by the module I mentioned above (i.e. dtw
). If I assign the distance computing method as cityblock
, which is equal to Manhattan
that is used in dtw_subsequence
in mlpy
, will give the same result? Since you know your code better, I would really like to know what you think...
Thanks a lot, Shani.
void
subsequence(double *x, double *y, int n, int m, double *cost)
{
int i, j;
cost[0] = fabs(x[0]-y[0]);
for (i=1; i<n; i++)
cost[i*m] = fabs(x[i]-y[0]) + cost[(i-1)*m];
for (j=1; j<m; j++)
cost[j] = fabs(x[0]-y[j]); // subsequence variation: D(0,j) := c(x0, yj)
for (i=1; i<n; i++)
for (j=1; j<m; j++)
cost[i*m+j] = fabs(x[i]-y[j]) +
min3(cost[(i-1)*m+j], cost[(i-1)*m+(j-1)], cost[i*m+(j-1)]);
}
this is the subsequence algorithm (in C). If it's the same, then I don't see why not. Output may vary depending on how it's implemented, as will speed.
I'll have to have a look at the pierre-rouanet/dtw
method and get back to you.
That would be great! Thanks heaps!
There's a "faster" option as well as I saw - 'https://pypi.org/project/fastdtw/`
Cheers, Shani.
lol. yea, i'll tell you now, that stuff doesn't work for nanopore. We have internally tested this, and the ROC curves were basically random.
Abstraction CAN work, and there are people who are doing this for nanopore, so stay tuned, But it's a tricky beast to wrestle with.
When you say "that stuff doesn't work for nanopore", are you referring to this fastdtw
or both fastdtw
and pierre-rouanet/dtw
?
And may I know, from your experience, why that is?
Below is the pierre-rouanet/dtw.dtw
code btw:
def dtw(x, y, dist, warp=1, w=inf, s=1.0):
"""
Computes Dynamic Time Warping (DTW) of two sequences.
:param array x: N1*M array
:param array y: N2*M array
:param func dist: distance used as cost measure
:param int warp: how many shifts are computed.
:param int w: window size limiting the maximal distance between indices of matched entries |i,j|.
:param float s: weight applied on off-diagonal moves of the path. As s gets larger, the warping path is increasingly biased towards the diagonal
Returns the minimum distance, the cost matrix, the accumulated cost matrix, and the wrap path.
"""
assert len(x)
assert len(y)
assert isinf(w) or (w >= abs(len(x) - len(y)))
assert s > 0
r, c = len(x), len(y)
if not isinf(w):
D0 = full((r + 1, c + 1), inf)
for i in range(1, r + 1):
D0[i, max(1, i - w):min(c + 1, i + w + 1)] = 0
D0[0, 0] = 0
else:
D0 = zeros((r + 1, c + 1))
D0[0, 1:] = inf
D0[1:, 0] = inf
D1 = D0[1:, 1:] # view
for i in range(r):
for j in range(c):
if (isinf(w) or (max(0, i - w) <= j <= min(c, i + w))):
D1[i, j] = dist(x[i], y[j])
C = D1.copy()
jrange = range(c)
for i in range(r):
if not isinf(w):
jrange = range(max(0, i - w), min(c, i + w + 1))
for j in jrange:
min_list = [D0[i, j]]
for k in range(1, warp + 1):
i_k = min(i + k, r)
j_k = min(j + k, c)
min_list += [D0[i_k, j] * s, D0[i, j_k] * s]
D1[i, j] += min(min_list)
if len(x) == 1:
path = zeros(len(y)), range(len(y))
elif len(y) == 1:
path = range(len(x)), zeros(len(x))
else:
path = _traceback(D0)
return D1[-1, -1] / sum(D1.shape), C, D1, path
yea that pierre-rouanet/dtw is a direct subsequence method and should work. Seems a little convoluted, but has a good amount of tunability in it. Windowing as well. leaving it as inf should wield similar results to mlpy, but this would need to be tested.
fastdtw uses subsampling to do a kind of banded alignment, and can very often lead to entirely spurious hits. In every example of that kind of abstraction technique for nanopore data, it just has not worked. I was first alerted to this by Matt Loose when he was investigating dtw for nanopore, and he had similar observations. We test this in another method we have been developing, where we tested a number of dtw approaches, and mlpy, or a method derived from it, is far superior.
Hey.
So in mlpy there is an install doc. It has the following
Installing on \*nix from source
-------------------------------
On GNU/Linux, OSX and FreeBSD you need the following requirements:
* GCC
* Python >= 2.6 or 3.X
* NumPy >= 1.3.0 (with header files)
* SciPy >= 0.7.0
* GSL >= 1.11 (with header files)
From a terminal run:
.. code-block:: sh
$ python setup.py install
If you don't have root access, installing mlpy in a directory by
specifying the ``--prefix`` argument. Then you need to set ``PYTHONPATH``:
.. code-block:: sh
$ python setup.py install --prefix=/path/to/modules
$ export PYTHONPATH=$PYTHONPATH:/path/to/modules/lib/python{version}/site-packages
If the GSL header files or shared library are in non-standard
locations on your system, use the ``--include-dirs`` and
``--rpath`` options to ``build_ext``:
.. code-block:: sh
$ python setup.py build_ext --include-dirs=/path/to/header --rpath=/path/to/lib
$ python setup.py install
Not sure the gsl you installed is the GNU scientific library used in the C compilers.
the init.py file does a gsl import, but I'm pretty sure that's importing the local folder gsl in the mlpy directory. So that may have something to do with your path variables?
Yes... using the build_ext
option before installing worked! Knowing that was very helpful!
Wondering whether this section also can be added to the instructions to install mlpy
as well to help people like me! But for the moment, I'll have a step-by-step guide here for anyone in need (I hope not! :) ). I'm working on a Linux platform btw.
module load python
)mlpy
3.5.0 from "https://sourceforge.net/projects/mlpy/files/mlpy%203.5.0/"tar -xvf mlpy-3.5.0.tar
cd mlpy-3.5.0
python setup.py build_ext --include-dirs=/path/to/header --rpath=/path/to/lib
(for me the GSL headers were in /usr/include/gsl
, and respective libraries were in /usr/lib
.python setup.py install
(I installed it for myself, so python setup.py install --user
)PYTHONPATH
(if using --prefix
option export PYTHONPATH='${PYTHONPATH}:location/to/site-packages
)MotifSeq.py
without an issue :) which worked for me.I hope I covered the summary from this thread here...
Also, you might already know this but, using mlpy/dtw
is way faster than the other option I have mentioned above.
Thanks for helping throughout James. Shani.
Yay! I'm glad that worked out.
Thanks for the breakdown. I'll add it to the docs.
Also yes, mlpy/dtw is bullet fast (though can be faster, but that's another story).
Thanks for the back and forth. This is helpful for understanding the different environments of others, as well as what is, and isn't explained properly. That can be really hard sometimes. So thankyou!
Also, Heads up, new MotifSeq dropping soon. I brought scrappie into motifseq so it can convert the motif's from a fasta input, and changed up some of the visuals to help people inspect their hits better. Bit of benchmarking coming too in the docs.
Cheers, James
Hey @shaniAmare
You might like the latest update on the README about mlpy and python3
As well as some of the changes in the output and interface.
Regards, James
Hi James,
running
MotifSeq.py
gives me this errorfrom mlpy import dtw_subsequence
ImportError: cannot import name dtw_subsequence
Were you planning to use another dynamic time warping module for this? Looks like
dtw_subsequence
is not a function ofmlpy
anymore. Please help!