JuliaImages / Images.jl

An image library for Julia
http://juliaimages.org/
Other
534 stars 141 forks source link

GSoC 2017 #595

Closed timholy closed 3 years ago

timholy commented 7 years ago

In my personal opinion, the core framework of JuliaImages is now sufficiently solid that it's time to shift focus to expanding the suite of capabilities. Progress will be faster if we attract additional contributors. I've posted a couple of project ideas over at https://github.com/JuliaLang/julialang.github.com/pull/511, but anyone else interested in mentoring or who has an interesting project proposal, please step forward! I've already flagged a couple of you as possible mentors based on what I know (or think I know) about your interests and expertise.

kvmanohar22 commented 7 years ago

@timholy ImageDraw.jl was started long back and hasn't been updated since. As of now only circle2d ellipse2d and line2d have been implemented. How about extending ImagesDraw.jl to include additional functionalities as given in scikit-images's draw module ? Any suggestions ?

timholy commented 7 years ago

Yes, I think it was intended to be essentially be a match for scikit-image's draw. Would be great to have some additional work put into it. In principle it might not be bad to have some text capabilities, too, although that would likely involve libraries like Cairo and therefore might mix visualization and algorithmic dependencies in ways that are not ideal.

mronian commented 7 years ago

Just a suggestion to the people who are working on their GSoC Proposals, it will really help you if you post a link to your proposal here and get feedback from all the contributors :smile:

Tagging those who I think were applying to GSoC @tejus-gupta, @kvmanohar22, @annimesh2809

timholy commented 7 years ago

I'd like to extend a very warm welcome to @tejus-gupta and @annimesh2809, our two accepted GSoC students who will be working on image code for Julia this summer. You both wrote outstanding proposals, and we're honored to have you working with us!

Thanks also to Google for providing amazing support (18 slots!) for Julia this summer. Between the language & ecosystem we are ready for huge growth, and it's fantastic to be awarded both the financial resources and the recognition that comes from being supported at such a high level.

I look forward to a fantastic summer!

annimesh2809 commented 7 years ago

Yaay! Got selected for GSoC 2017. Thanks @timholy @Evizero @mronian and others for all your help and support. Very excited for working this summers!

tejus-gupta commented 7 years ago

Thanks @timholy @mronian and everyone else for the help. Looking forward to a great summer!

timholy commented 7 years ago

@annimesh2809 and @tejus-gupta, one thing we should discuss: you both proposed to work on image segmentation, and while your proposals had significant differences, there were also areas of overlap. We should probably coordinate a bit to make sure we collectively tackle the proposals' goals without doing redundant work. My guess is that this will also free up some time, and we may be able to tackle issues that go beyond the written proposals (that kind of tends to happen anyway...).

In my opinion there is so much interesting and important stuff that needs doing, one thing I'm not worried about is running out of tasks :smile:.

annimesh2809 commented 7 years ago

Yes, it would be best if we distribute the work properly to avoid redundancy and allow for maximum productivity during the summers. Here is the list of algorithms I intended to implement during the summers:

  1. Threshold based segmentation (Global + Variable + Multiple thresholding). Also to find appropriate threshold: Otsu's method, Balanced histogram and Iterative selection method.
  2. Seeded region growing
  3. Unseeded region growing
  4. Region splitting and merging using quadtrees.
  5. Fast scanning algorithm
  6. k-means clustering + seed generation using k-means++
  7. Fuzzy c-means clustering
  8. Watershed transformation using Meyer's flooding algorithm

@tejus-gupta Kindly comment your proposed work so that we can see the overlap clearly and distribute the work. Once the work in the written proposals is finished, we can go about expanding Julia Images further by adding many other interesting features to it.

tejus-gupta commented 7 years ago

I intended to implement these algorithms:

  1. Thresholding - Otsu's methods and adaptive tresholding
  2. K-means clustering (using Clustering.jl)
  3. Image Segmentation using Mean shift
  4. Watershed segmentation
  5. Felzenszwalb's regions splitting algorithm
  6. Normalized graph cut based segmentation

I will also add HOG features to ImageFeatures. Link to my proposal.

annimesh2809 commented 7 years ago

Nice! The intersection came out as:

  1. Thresholding algorithms (1, 1)
  2. k-means clustering (6, 2)
  3. Watershed transformation (8, 4)

I like the Vincent and Soille's algorithm that you suggested for Watershed transformation.

@timholy Would you like to assign these algorithms to us or should we do this on our own?

Also, I was thinking of discussing the API for these algorithms. They have many practical applications and will be used with object detection and compression algorithms, hence I was thinking of having a consistent API for all these algos. This way users can easily test which one is most suitable for their needs without any hassle of referring to the documentation every time.

SimonDanisch commented 7 years ago

There are already various implementations of k-means! Please ping me if you want to integrate them into a package.. I've an implementation flying around, which runs on the GPU/ or multi threaded.

timholy commented 7 years ago

@timholy Would you like to assign these algorithms to us or should we do this on our own?

Any way is fine with me. If I picked I'd just flip a coin :stuck_out_tongue:, so if the two of you want to work it out together that would be great. Or, if you want to work on that code together (or at least review the others' PR), that would be fine too.

And definitely, I agree that some kind of interchangeable-parts API would be really great.

There are already various implementations of k-means!

They'd already figured that out when they submitted their proposals :smile:. The main thing that needs to be done is a nice wrapper so that the array of colors, in a suitable colorspace, is passed to a standard kmeans.

annimesh2809 commented 7 years ago

After discussing with @tejus-gupta, we both agreed upon the following:

Algorithms implemented by @tejus-gupta: 1) Thresholding - Otsu's methods and adaptive thresholding 2) Image Segmentation using Mean shift 3) Watershed segmentation 4) Felzenszwalb's regions splitting algorithm 5) Normalised graph cut based segmentation

Algorithms implemented by me: 1) Seeded region growing 2) Unseeded region growing 3) Region splitting and merging using quadtrees. 4) Fast scanning algorithm 5) k-means clustering + seed generation using k-means++ (Using clustering.jl) 6) Fuzzy c-means clustering

Before implementing any of these algorithms, we would be sharing ideas and discussing it extensively amongst us. Some of the difficult algorithms might require working on the code together.

annimesh2809 commented 7 years ago

@timholy @tejus-gupta Should we place these segmentation algorithms in ImageFeatures.jl or in a seperate package? Also other than github, is there any place for discussing ideas and issues regarding JuliaImages (like a gitter, slack, irc channel)?

timholy commented 7 years ago

I don't have strong feelings, but I wonder if an ImageSegmentation package would be appropriate?

As for discussing ideas, there are at least three ways:

juliohm commented 7 years ago

I tried to create a room for Images.jl on gitter, but it didn't work, perhaps I don't have the necessary permissions. Could you please try to create a room called Images.jl besides the Lobby in the organization? We can then add a badge to the README in Images.jl as other packages do, see: https://github.com/JuliaGraphs/LightGraphs.jl

timholy commented 7 years ago

I gave it a try, but I'm not sure I did it correctly. Can you check?

tejus-gupta commented 7 years ago

I think that threshold() should into Images.jl and the other segmentation algorithms should go into a separate ImageSegmentation package.

tejus-gupta commented 7 years ago

For threshold(), we need to discuss

  1. Whether the function returns a boolean array or a binary image(0/255) in same datatype as input.
  2. For otsu/local threshold, do we just compute the threshold or return the thresholded image?
tejus-gupta commented 7 years ago

I would suggest that the threshold() outputs binary image in the same datatype as the input image. If the user wants boolean array, he can use the img.>thres syntax. We should write functions for otsu/adaptive threshold that return the threshold or a mask of thresholds for every pixel incase of adaptive thresholds. The threshold() function would be able to take a global threshold as well as an array of thresholds to threshold pixelwise. The sole purpose of threshold() function is to easily get a binary image in the datatype in which the user was processing the image in. The threshold function would have options for inverted, truncated thresholding also.(See here)

annimesh2809 commented 7 years ago

The coding period has officially begun! I also agree upon creating a separate ImageSegmentation.jl package for these algorithms. I have started working on the Seeded Region Growing Algorithm and will keep everyone updated through the gitter channel. Formal updates will be posted directly on this issues page.

timholy commented 7 years ago

Exciting to be underway! I created a "blank" ImageSegmentation.jl repo at JuliaImages and sent you both an invitation for direct push access. One of you can locally generate the package (PkgDev.generate("ImageSegmentation", "MIT")) and then push it. (Would probably be best to do that sooner than later so that you are both working from a common source.)

annimesh2809 commented 7 years ago

@SimonDanisch There is a k-means implementation in Clustering.jl but it does not utilize CPU multithreading or the GPU. The benchmark results for a 500*500 3-channel image was ~7 secs for k = 30 and ~1.6 secs for k = 10. Could you kindly provide your implementation of k-means that works on GPU (maybe also the CPU multithreaded one)? @timholy Can we decide which version of k-means to use on runtime depending on the hardware? For eg: If the user has performant GPU then we could use the GPU version otherwise we could switch to the multithreaded one.

SimonDanisch commented 7 years ago

Here you go: https://gist.github.com/SimonDanisch/334da62b437983005ab1567f3e69243c#file-kmeans-jl

Tested Image size: (1600, 2560), RGB{Float32} 1 Thread: 1.23 4 Threads: 0.31

In principle, this should also work on the GPU, but I need to release a new version of GPUArrays to make it work!

annimesh2809 commented 7 years ago

Took some time to set up the requirements (CUDA toolkit and all, although I realized later that it was not necessary). Carried out some benchmarks and comparison tests (with Clustering.jl's implementation) and the difference is huge. For 512x512 GrayScale image (testimage("house")), for 20 clusters and maximum iterations bounded by 10000, the benchmarks for Clustering.jl implementation:

julia> @benchmark kmeans($v, 20, maxiter=10000)
BenchmarkTools.Trial: 
  memory estimate:  514.02 MiB
  allocs estimate:  7864591
  --------------
  minimum time:     549.971 ms (8.34% GC)
  median time:      762.742 ms (17.54% GC)
  mean time:        755.456 ms (23.33% GC)
  maximum time:     873.357 ms (36.37% GC)
  --------------
  samples:          7
  evals/sample:     1

For similar settings and using JLBackend as the backend for GPUArrays, the benchmarks are:

julia> @benchmark kmeans($gimg, $gclusters, 10000)
BenchmarkTools.Trial: 
  memory estimate:  536.05 KiB
  allocs estimate:  353
  --------------
  minimum time:     157.874 ms (0.00% GC)
  median time:      158.570 ms (0.00% GC)
  mean time:        159.208 ms (0.00% GC)
  maximum time:     164.408 ms (0.00% GC)
  --------------
  samples:          32
  evals/sample:     1

Difference can be seen in both memory allocation as well as time spent. @SimonDanisch Awesome implementation!!! :smile:

However, GPUArrays currently does not have full support for CUBackend as well as CLBackend.

@timholy Should we have this multithreaded version of k-means or the Clustering.jl version as our core algorithm? Or rather should we ask Clustering.jl to change their core implementation to this one?

timholy commented 7 years ago

I don't think we can rely on people having CUDA installed, and indeed many potential users may not even have a GPU installed. For that reason, I think we have to have some kind of fallback with relatively minimal requirements.

In principle we can use ComputationalResources to allow people to opt-in to implementations that may leverage GPUs. Other than how it's used in ImageFiltering, I've not yet really gone through the full exercise yet, but I think @Evizero has played with it and reported success for real-world dispatch to GPU-enabled variants.

Another good idea would be to check the implementation in Clustering.jl and see if it can be improved.

Evizero commented 7 years ago

I have indeed played around with aspects of ComputationalResources and as far as I can judge, given the current limitations of conditional package requirements, it seems like the simplest and user friendliest approach.

I am using it even for scenarios without GPU considerations to allow for the ability to choose between single and multithreaded implementation (see https://github.com/Evizero/Augmentor.jl/blob/master/src/augmentbatch.jl#L30-L68).

Concerning CUDA, I am afraid I don't have any code to point to because I haven't yet decided if I'll open source it. I am working on a Acoustic wave simulator and there I use ComputationalResources to choose between single threaded, tiled multithreaded (see https://github.com/JuliaArrays/TiledIteration.jl), or CUDAnative kernels.

The only concern I have so far (which may or may not be resolved in future versions of Julia) are the additional requirements for being able to compile CUDAnative kernels in contrast to just running them (see https://github.com/timholy/ComputationalResources.jl/issues/7).

SimonDanisch commented 7 years ago

GPUArrays has that kind of fallback already (threaded Julia backend, which is what @annimesh2809 used in the benchmark) and when I get the GPU stuff to work, it should work on all GPUs! @annimesh2809 did you verify correctness? Did you start Julia with JULIA_NUM_THREADS=8 julia -O3 ?

timholy commented 7 years ago

The biggest problem I see is that GPUArrays has a lot of strict requirements, and I know from personal experience that Pkg.build("CUDArt") throws an error if you don't have CUDA installed. Consequently, we have to abstract the "resources" part of this and not rely on GPUArrays' fallback implementations for the CPU implementation.

SimonDanisch commented 7 years ago

Or we could improve the build script from GPUArrays. Right now, you only get an (ignorable) error, because I need to put things into the REQUIRE. @tkelman until we have better support for this in the package manager, what is your view on making it possible to automatically install all supported backends? I'm thinking of a build script that checks out all the hardware requirements and then just installs the packages that work on the hardware. It's not nice, but at least offers a nice user experience. We can have a comment about this in the README and in the REQUIRE and inform the user during the build process about what's happening.

tkelman commented 7 years ago

No, library code and build scripts should never execute any Pkg commands.

SimonDanisch commented 7 years ago

But this is a real problem that needs some solution! Is there a plan to have something like this work nicely soon? What would you recommend? I feel like valid solution should only be denied, when there is a reasonable alternative solution! Should I supply an install script and actually remove the packages with hardware dependencies from the REQUIRE? I might also be able to put backends in separate packages, but that would increase my work significantly, due to the interwoven nature of glue code that i have in GPUArrays.

timholy commented 7 years ago

There is an easy & totally viable solution: make GPUArrays depend on ComputationalResources, and use it to dispatch on different implementations. Think of it as "FileIO for processing power."

SimonDanisch commented 7 years ago

But that doesn't solve the installation problem, does it? And a prerequisite for that would be putting the backends into separate packages, which I really can't commit to right now. The kind of installation magic we have in FileIO is pretty much what @tkelman argues against, I think!

Dispatching works totally fine with GPUArrays, so I don't see right away what that would solve.

timholy commented 7 years ago

Yes, you'd put the backends in their relevant packages. So in Clustering.jl you'd have

using ComputationalResources, Requires

@require GPUArrays begin
    kmeans(::CUDANativeResource, args...) = 
end

(CUDANativeResource isn't in ComputationalResources yet, but it could be.) Since Requires is much more sane on 0.6, personally I think this is pretty reasonable.

It solves the installation problem by bypassing it: you only have to worry about the installation of GPUArrays if you want to use it. Clustering depends on two tiny packages, ComputationalResources and Requires, rather than a big package like GPUArrays. Likewise, you don't end up needing ClusteringWithGPUArrays.jl or anything crazy like that.

SimonDanisch commented 7 years ago

I see, so you meant solving the problem here ;) Fair enough! I wanted to solve the problem by having GPUArrays so easy to install, that it actually wouldn't be a problem to just include it into Clustering, which i think is desirable anyways. Making GPU packages easy to install would be a huge win :)

tkelman commented 7 years ago

Requires still isn't precompile-compatible with the optional code, but https://github.com/JuliaLang/julia/pull/21743 would put in place something that would be.

annimesh2809 commented 7 years ago

@annimesh2809 did you verify correctness?

Yes, I verified the correctness of the implementation with tests similar to those in tests/kmeans.jl (only the non-weighted version)

Did you start Julia with JULIA_NUM_THREADS=8 julia -O3 ?

Yep, I exported the environment variable (although I set it to 4) and without the extra optimization (-O3).