biolab / orange3-imageanalytics

🍊 :rice_scene: Orange3 add-on for dealing with image related tasks

GNU General Public License v3.0

32 stars 42 forks source link

Local numpy embedders #124

Closed PrimozGodec closed 5 years ago

PrimozGodec commented 5 years ago

Description of changes

Local embedder which is Numpy based - Squeezenet

Discusion

Speed: Tensorflow: ~0.025 per image, Numpy: ~0.17 per image. The speed difference between Numpy and Tensorflow is big (the order of 10).
Should Numpy deep library be a separate package or included in image analytics. Currently, it is separate since it is quite big and complex.

Includes

[X] Code changes
[X] Tests
[X] Documentation

PrimozGodec commented 5 years ago

Before the implementation is finished I think we need to discuss. The main issue here is that Numpy implementation of embedding is much slower than one by Tensorflow (Tensorflow is highly optimized on some types of processor instructions). Tensorflow, for example, needs ~0.025 s per image while Numpy implementation needs ~0.17 s per images what is the difference of almost order 10. What should we do here? Do we go back to some deep learning library (like Tensorflow) or we stay with slower Numpy based embedding? @markotoplak @lanzagar

codecov-io commented 5 years ago

Codecov Report

Merging #124 into master will increase coverage by 2.76%. The diff coverage is 84.93%.

@@            Coverage Diff             @@
##           master     #124      +/-   ##
==========================================
+ Coverage   76.04%   78.81%   +2.76%     
==========================================
  Files           5        7       +2     
  Lines         526      623      +97     
  Branches       84       95      +11     
==========================================
+ Hits          400      491      +91     
- Misses         98      101       +3     
- Partials       28       31       +3

codecov-io commented 5 years ago

Codecov Report

Merging #124 into master will increase coverage by 3.85%. The diff coverage is 87.28%.

@@            Coverage Diff            @@
##           master    #124      +/-   ##
=========================================
+ Coverage   76.04%   79.9%   +3.85%     
=========================================
  Files           5       7       +2     
  Lines         526     622      +96     
  Branches       84      95      +11     
=========================================
+ Hits          400     497      +97     
+ Misses         98      97       -1     
  Partials       28      28

lanzagar commented 5 years ago

I think a local embedder is something we should have. If this is as fast as we can do it for now, I think it is ok. We still have the option to use the server for faster computations (if it was the same speed, remote embedders would kind of lose sense)... If tensorflow is unavoidably so much faster than numpy, we can consider (in the future, not in this PR) to have two implementations and use tensorflow when it is available (if user has it installed) otherwise fall back to this numpy implementation.

markotoplak commented 5 years ago

I agree with @lanzagar.

PrimozGodec commented 5 years ago

@lanzagar I agree too. I will try to speed up local Numpy embedders a bit but it will not be significant. I will also try to test whether we can use PyTorch instead of Tensorflow - it is smaller in size.

Form my side this PR is ready to merge after it will be checked.

markotoplak commented 5 years ago

@PrimozGodec, in the first commit of this PR you added a big file (a model), which you later removed. Could you rebase this file out as it would only increase repository?

I am OK with keeping tensorflow code in history, but I'd be careful about big files.

PrimozGodec commented 5 years ago

@markotoplak I removed it. I will also consider rebasing some other commits since some of them does not make sense.

Currently, I am solving the issue with Bus error: 10 which occurs while multiplying in convolution. It occurs with Numpy 1.15.* at my machine, and with Numpy 1.15.4 and 1.16.2 at @BlazZupan's MacOs. https://github.com/numpy/numpy/issues/13155

markotoplak commented 5 years ago

Yesterday I did not get any errors on a Mac with conda's numpy 1.15.4.

PrimozGodec commented 5 years ago

@markotoplak It looks so weird. It has different behavior on different machines. It seems that it is an issue caused by different combinations of imports. I am trying to figure out which combination causes the issue.