Closed PrimozGodec closed 5 years ago
Before the implementation is finished I think we need to discuss. The main issue here is that Numpy implementation of embedding is much slower than one by Tensorflow (Tensorflow is highly optimized on some types of processor instructions). Tensorflow, for example, needs ~0.025 s per image while Numpy implementation needs ~0.17 s per images what is the difference of almost order 10. What should we do here? Do we go back to some deep learning library (like Tensorflow) or we stay with slower Numpy based embedding? @markotoplak @lanzagar
Merging #124 into master will increase coverage by
2.76%
. The diff coverage is84.93%
.
@@ Coverage Diff @@
## master #124 +/- ##
==========================================
+ Coverage 76.04% 78.81% +2.76%
==========================================
Files 5 7 +2
Lines 526 623 +97
Branches 84 95 +11
==========================================
+ Hits 400 491 +91
- Misses 98 101 +3
- Partials 28 31 +3
Merging #124 into master will increase coverage by
3.85%
. The diff coverage is87.28%
.
@@ Coverage Diff @@
## master #124 +/- ##
=========================================
+ Coverage 76.04% 79.9% +3.85%
=========================================
Files 5 7 +2
Lines 526 622 +96
Branches 84 95 +11
=========================================
+ Hits 400 497 +97
+ Misses 98 97 -1
Partials 28 28
I think a local embedder is something we should have. If this is as fast as we can do it for now, I think it is ok. We still have the option to use the server for faster computations (if it was the same speed, remote embedders would kind of lose sense)... If tensorflow is unavoidably so much faster than numpy, we can consider (in the future, not in this PR) to have two implementations and use tensorflow when it is available (if user has it installed) otherwise fall back to this numpy implementation.
I agree with @lanzagar.
@lanzagar I agree too. I will try to speed up local Numpy embedders a bit but it will not be significant. I will also try to test whether we can use PyTorch instead of Tensorflow - it is smaller in size.
Form my side this PR is ready to merge after it will be checked.
@PrimozGodec, in the first commit of this PR you added a big file (a model), which you later removed. Could you rebase this file out as it would only increase repository?
I am OK with keeping tensorflow code in history, but I'd be careful about big files.
@markotoplak I removed it. I will also consider rebasing some other commits since some of them does not make sense.
Currently, I am solving the issue with Bus error: 10
which occurs while multiplying in convolution. It occurs with Numpy 1.15.* at my machine, and with Numpy 1.15.4 and 1.16.2 at @BlazZupan's MacOs.
https://github.com/numpy/numpy/issues/13155
Yesterday I did not get any errors on a Mac with conda's numpy 1.15.4.
@markotoplak It looks so weird. It has different behavior on different machines. It seems that it is an issue caused by different combinations of imports. I am trying to figure out which combination causes the issue.
Description of changes
Local embedder which is Numpy based - Squeezenet
Discusion
Includes