Closed YoelShoshan closed 7 years ago
Thank you for the information. I'll have a closer look later this week.
Can you check which of these instructions taking so long?
On my machine, transferring data from Matlab to Python is about twice as slow in Transplant than in the Matlab Engine for Python.
Can you send me a simple benchmark output (assuming you run this in IPython):
import matlab.engine
e = matlab.engine.start_matlab()
%timeit x = e.randn(1e7,1)
from transplant_master import Matlab
m = Matlab()
%timeit x = m.randn(1e7,1)
It seems that base64 encoding is very slow. I think I can solve this.
ok, here are the results:
import matlab.engine e = matlab.engine.start_matlab() %timeit x = e.randn(1e7,1) 1 loop, best of 3: 727 ms per loop
from transplant_master import Matlab m = Matlab() %timeit x = m.randn(1e7,1) 1 loop, best of 3: 3.97 s per loop
As you can see, over x4 difference. [edited]
Cool - you narrowed it down to base64 encoding? Looking forward to see the progress :) Feel free to tell me if you want me to test anything else.
btw - maybe the tcp option instead of IPC of 0MQ on windows is also part of the issue?
The latest version should solve the performance problem. In msgpack-mode (the default), Transplant will now use native Msgpack binary data instead of base64, which brought a massive performance improvement.
And yes, the current version should automatically use TCP on windows.
Can you check again on your end if the latest changes still run?
OK, I merged your changes to the header file.
Can you check if
On my end, it looks fantastic! Thank you so much for all your contributions. This has made Transplant a lot faster, and much nicer to work with!
No prob, my pleasure :)
First of all, using latest windows works - I didn't need to do any extra changes.
Performance wise there's a big improvement! :) However, matlab.engine is still faster, as you can see.
import matlab.engine e = matlab.engine.start_matlab() %timeit x = e.randn(1e7,1) 1 loop, best of 3: 732 ms per loop
from transplant import Matlab m = Matlab() %timeit x = m.randn(1e7,1) 1 loop, best of 3: 1.45 s per loop
so basically you went down from 3.97s to 1.45s which is great!
Do you have a windows system to run it on? If yes, do you see different results? (proportion wise)
I don't have a Windows system available at the moment, but I guess I'll just have to set one up.
I checked this again on my end, and found another way of speeding things up. On my computer, Transplant is now about 30% faster than the MEfP with the above example. Note that the MEfP does not return usable Numpy arrays, it merely returns a matlab.double
object, which hasn't actually transferred the data from Matlab to Python yet. Try calling numpy.array(e.randn(1e5,1))
for a fair comparison.
Sure - will do today.
You are absolutely correct! Adding the numpy array constructing gave quite amazing results: \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/ arr_size = 1e5 %timeit x = np.array(e.randn(arr_size,1)) #matlab.engine (WITH numpy convertion) 1 loop, best of 3: 1.98 s per loop %timeit x = m.randn(arr_size,1) #transplant 100 loops, best of 3: 13 ms per loop \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/ arr_size = 1e6 %timeit x = e.randn(arr_size,1) #matlab.engine (no numpy convertion) 10 loops, best of 3: 71.9 ms per loop %timeit x = np.array(e.randn(arr_size,1)) #matlab.engine (WITH numpy convertion) 1 loop, best of 3: 18.6s per loop %timeit x = m.randn(arr_size,1) #transplant 10 loops, best of 3: 104 ms per loop \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
As you can see, I didn't have time to wait for the 1e7 version ;)
Pretty amazing results - well done! These results combined with transplant's lack of leaking ( ;) ) is very impressive.
Cheers.
btw - if someone can reproduce the results I get on windows, that would be great =) (note: I used matlab 2014b, results on newer matlab is probably better)
I will try to reproduce your results on Windows later this week. I hope I'll find the time.
I'll start by saying that as you've discovered, matlab.engine is practically not usable for me, due to their massive memory leaks.
Note: I'm using Matlab 2014b
However, unlike what you found on linux, on windows it seems that transplant is much slower than matlab.engine, at least in certain operations.
I can't provide a fully reproducable scenario, but here are the main relevant functions:
And here's the code using matlab.engine (it's a bit different, I actually do MORE there...)
Each .mat file may contain few volumes.
Here's the comparison run times filename _matlab_engineseconds _transplantseconds mat_file_1 5.4 18.3 mat_file_2 30.8 75.3 mat_file_3 36.3 174.2
If possible, please test your library on windows passing around big volumes and say if you see the same performance issue.