[Request] Remove CUDA dependency

mia-0 commented 6 years ago

I realize this is a lot to just ask without the time and ability to implement and maintain the necessary changes myself. Hence, I’ll spare you the long-winded political speech. The gist of it is: You can’t call something free and open source software when it depends on and endorses proprietary components whose only purpose is vendor lock-in.

AliceVision should be able to work without CUDA, no matter how glacially slowly. I prefer inefficient CPU-only computation that spills registers and caches all over the place over a requirement for GPUs with inferior Linux support and very unstable drivers (my GTX 960 keeps freezing my computer whenever NVIDIA’s driver decides it doesn’t want to do memory management anymore, and it is IMPOSSIBLE to report this problem to NVIDIA unless you’re a big corp and there’s money involved). I simply don’t have the patience to deal with this garbage, and I desperately want to move to a different GPU vendor so I get proper support for my platform.

Ideally, the CUDA parts should be ported to an open platform such as Vulkan or the older OpenCL.

fabiencastan commented 6 years ago

Currently, we have neither the interest nor the resources to do another implementation of the CUDA code to another GPU framework. If someone is willing to make this contribution, we will support and help for integration.

cody-code-wy commented 6 years ago

I was looking into this some to see if there were any tools to make such a transition easier I found a project called swan thats is meant to make it very simple to effectively 'translate' Cuda kernels and code into OpenCL equivalents. Though it has not been updated in some time, so it may not help very much.

I feel like its worth pointing out also that OpenCL works on most embedded GPUs, integrated GPUs, and many FPGAs include drop in modules to allow OpenCL functionality. All of this means that if a change like this was made to AliceVision there would be many new potential uses. Such as micro computer clusters, or use on mobile devices directly.

fabiencastan commented 6 years ago

It's difficult to find a good solution in this technology war with Apple deprecation of OpenGL and OpenCL: https://developer.apple.com/macos/whats-new#deprecationofopenglandopencl

Another interesting project on this topic is HIP: https://gpuopen.com/compute-product/hip-convert-cuda-to-portable-c-code

zvrba commented 6 years ago

That one is easy though: ditch OSX support. OSX is IME by far the worst and most buggy implementation of POSIX APIs that I've had to work with.

cody-code-wy commented 6 years ago

I agree that theres no particularly good solution currently.

I agree that apple's depreciation of OpenCL could be somewhat problematic, but I feel like its worth pointing out that relatively few of apple's systems have any support for Nvidia cards so CUDA is not much better for supporting Mac OS.

Also HIP looks like a pretty nice option. There seems to have been a few interesting similar projects in the past like gpuocelot, which is sadly now defunct.

Apperently Vulkan can be used for GPGPU, and thats supported in windows and linux on both AMD and Nvidia, and with MoltenVK on anything supporting apple's Metal APIs. But Vulkan is still pretty new so there not much info out there about using it for GPGPU...

fabiencastan commented 6 years ago

I would be interested in trying Halide as it enables to write high-level algorithms but also enable fine tuning of the scheduling. And then it generates code for each target.

HPG2017_FastImageProcessing halide-inria-march2017

zvrba commented 6 years ago

ISPC (https://ispc.github.io/) could be another option. It also has an (experimental) PTX backend.

cody-code-wy commented 6 years ago

Halide looks like a pretty good option. While there is no metal backend yet it looks like (from issues on their github) a few people may be working on one, but obviously osx has OpenCL still for now.

And with support for ARM v7/NEON it could be used on Raspberry PIs (2 and later) and the like, and even android devices. That could seriously open up what AliceVision could be used for in the future.

zvrba commented 6 years ago

I'm skeptical about using something not backed by a major industry vendor. Halide is an academic project, they may get tired of developing it (when they've exhausted publishable stuff), they probably don't care about breaking changes (from the homepage: "These academic publications describe the ideas behind Halide and its scheduling model. Halide syntax changes over time, so don't rely on them for correct syntax."), etc.

Tools from major industry vendors (nvidia, intel) aren't open-source. So what?

If there's a viable alternative to CUDA, it's SYCL (Khronos standard; opencl using modern c++, i.e., something resembling CUDA), but the downside is that there are no free (as in beer) quality compilers that I'm aware of.

OpenCL seems to be the most future-oriented as it can support FPGAs as well. Intel has acquired Altera and another FPGA manufacturer, and OpenCL tooling will probably follow.

AndreaMonzini commented 6 years ago

We are trying to compile Meshroom and AliceVision with Linux but it's sad to discover that it will work only with a proprietary solution that i do not have ( i use AMD GPU with Mesa driver).

mia-0 commented 6 years ago

To me the issue is: I do have the hardware, but it is just unstable as hell, requiring a lot of power cycles (since even the reset buttons stop working). Have been able to reproduce this with multiple kernel versions, driver versions, motherboards, GPUs, PSUs… It’s safe to say that it’s not a hardware issue, other than potential firmware bugs.

Anyway, my suggestion is to take a step back from all the frameworks and try to get just a basic C implementation done, with no drastic optimization whatsoever. My belief is that this will make future native ports (Vulkan, etc.) and SIMD optimization much easier, especially for outside contributors, because C is much more accessible. Also, before deciding on frameworks in an attempt to cover all potential use cases, it’s probably best to understand the challenges and requirements by doing a clean implementation with minimal external dependencies first.

AndreaMonzini commented 6 years ago

hi @fabiencastan is there a way to support a solution like HIP or Halide ? Maybe an open-source bounty?

I think that the support for only 1 GPU vendor with proprietary GPGPU solution sounds limiting for a very promising free and open source project.

I could find and buy a proprietary software alternative for the photogrammetry but i prefer to support free and open source software and i use AMD GPU for its free and open source drivers.

https://github.com/ROCm-Developer-Tools/HIP

Anyway thank you for sharing your work :)

AndreaMonzini commented 6 years ago

example of HIP porting:

https://gpuopen.com/ported-caffe-hip-heres-happened/

Ashtreighlia commented 6 years ago

Hi everyone,

I read through the comments and it seems like the ditching of OpenCL/GL in the new OSX versions gives the developers a tiny headache on what computing language to use for this program. I am a Mac user and since following the "development" of new macs (with metal1&2), it seems to me like the are ditching every other computing enviroment. Despite the fact that the last Nvidia GPUs used in any models was around 2013 and with the upcoming and already existing empire of Metal, this propably won't change soon. Just want to give my view on the OSX "issue" ^^

Have a good one

AndreaMonzini commented 6 years ago

Hello, for what understand HIP uses C++ so it should be compatible without OpenCL.

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_faq.md#how-does-hip-compare-with-opencl

kwahoo2 commented 6 years ago

Here is output after running hip (rocm) converter:

adi@adi-ryzen7:~/kompilacje/AliceVision$ /opt/rocm/hip/bin/hipconvertinplace-perl.sh src
...
info: TOTAL-converted 713 CUDA->HIP refs( dev:153 mem:74 kern:150 coord_func:0 math_func:0 special_func:3 stream:0 event:0 err:7 def:3 tex:323 extern_shared:0 other:0 ) warn:39 LOC:665119
  warning: unconverted cudaReadModeNormalizedFloat : 9
  warning: unconverted cudaArraySurfaceLoadStore : 6
  warning: unconverted cudaExtent : 5
  warning: unconverted cudaMemcpy3DParms : 4
  warning: unconverted cudaMemcpy3D : 4
  warning: unconverted cudaMalloc3DArray : 3
  warning: unconverted cudaMalloc3D : 2
  warning: unconverted cudaMemcpy2DFromArray : 2
  warning: unconverted cudaMemcpyFromArray : 2
  warning: unconverted cudaPitchedPtr : 2
  kernels (2 total) :   nearestKernel(1)  pushPull_Pull_kernel(1)

MrMinimal commented 6 years ago

@Storagraph The deprecation of OpenGL is not too much of a problem since there are multiple translation libraries which can convert to multiple graphics backends. Khronos have succeeded in getting Vulkan to run everywhere regardless of graphics API thanks to the portability initiative. MoltenVK enables vendors to target MacOS as well when using Vulkans compute shaders. So Vulkan is most portable option out there.

If anyone is intimidated by the Vulkan API, there is a project which reduces it's complexity: V-EZ

So the CUDA dependency could be removed if Vulkan compute shader were used.

Ashtreighlia commented 6 years ago

@MrMinimal I just mentioned OpenGL for completeness. Vulkan/OpenGL/DirectX/D3D (graphic apis) are used for rasterization of 3D Objects and are generally not used for computing tasks, OpenCL (open computing language) is for computing. There is a work around by using SPIR-V to access OpenCL via the front-end in Vulkan, but doesn't this also need the support for OpenCL on OSX in the first place? Just to mention it, Apple announced in a press release, that will ditch both OpenCL & GL.

Sorry for the confusion ^^

AndreaMonzini commented 6 years ago

@kwahoo2 thank you for the conversion with HIP, i think it could be the right solution with additional work.

PolarNick239 commented 6 years ago

Currently HIP doesn't support Windows and doesn't support amdgpu-pro driver under Linux (in fact only rocm platform under Linux is supported).

AndreaMonzini commented 6 years ago

As supporter of free and open source software under Linux i prefer AMDGPU Mesa FOSS driver. I would like to support AliceVision also because it's a FOSS project and a FOSS driver like OpenCL, Vulkan, HIP or alternatives, would be the best solution in the FOSS perspective.

AndreaMonzini commented 5 years ago

Hello, just to inform about a new interesting project based on Vulkan that could be useful:

https://github.com/jgbit/vuda

beta-tester commented 5 years ago

any chance to run AliceVision/Meshroom - CPU only - without any specialized hardware, without nVidia, ... ? most of the discussion i see here is about nVidia, CUDA, AMD, Vulkan, macOS, Metal, ... (voodoo :P)

i have only an older intel CPU (i7-3xxx) with a "built-in" intel GPU (HD-4000) - i don't need more GPU power than the GPU on CPU. to me, time doesn't matter...

EmteZogaf commented 5 years ago

@MrMinimal I just mentioned OpenGL for completeness. Vulkan/OpenGL/DirectX/D3D (graphic apis) are used for rasterization of 3D Objects and are generally not used for computing tasks, OpenCL (open computing language) is for computing. There is a work around by using SPIR-V to access OpenCL via the front-end in Vulkan, but doesn't this also need the support for OpenCL on OSX in the first place? Just to mention it, Apple announced in a press release, that will ditch both OpenCL & GL.

Sorry for the confusion ^^

Looking at the diagram Bringing OpenCL Compute to Vulkan it looks like there will be no need for an OpenCL environment in the future, just a compiler to Vulkan code.

mia-0 commented 5 years ago

One library which uses Vulkan compute shaders is libplacebo, which will be used by upcoming VLC releases. It should be obvious, but performance can vary wildly depending on implementation. Here’s an interesting tidbit: https://github.com/haasn/libplacebo/blob/master/demos/video-filtering.c

ShalokShalom commented 5 years ago

Hi there :)

I see a lot of options here shown by others and I honestly trust this project. ^.^

I initiate an open source video game, which makes great use of photogrammetry.

In order to do that, is the help of the community to capture and import pictures important. Easy accessibility in every possible way is obviously important.

I am interested why you implemented it in the first place as the only solution? You sit there and think "NVIDIA only will work great on an open source project"?

I heavily doubt that. I guess you had back then in mind that you can still add other solutions. You probably have forgotten about it since years passed by and you now get hinted about it again.

My reason to support open source is the same one as for 90% of the others.

To choose a software that is significantly behind in terms of features and performance, compared to software for which I have to pay 30€ per month is only logical when it provides another huge benefit, like being complete open source.

The only reason to support this project to me is the commitment that this is going to happen.

GPU acceleration is fine, CUDA is fine, NVIDIA only is questionable to me. I hope you see this issue :) I

mia-0 commented 5 years ago

I think the only reason AliceVision ended up with CUDA is that CUDA, despite its lackluster documentation and stability issues, is extremely popular in academics.

ShalokShalom commented 5 years ago

So Vision is only for academics? And academics is already sold to NVIDIA? And this project supports this direction?

zvrba commented 5 years ago

@lachs0r CUDA having "lackluster documentation"?! It has more documentation than I've ever seen for OpenCL, it integrates nicely with C++ language, with Visual Studio debugger, it comes with decent performance/profiling tools... PLEASE, point me to an OpenCL implementation with as advanced tooling as CUDA. Probably Intel's implementation is a candidate, but that one is not as easy to get free (of charge) as CUDA. And then you're still stuck with low-level APIs that don't integrate nicely with C++ (I know of no free (of charge) quality SYCL implementation).

So if anything, it boils down to developer-friendliness.

griwodz commented 5 years ago

We have CPU versions for the two feature extractors, only DepthMap is CUDA-only. The stage can be bypassed, but it is important for quality. If anyone has the time and skill to port DepthMap to CPU or OpenCL, it would be a welcome candidate for inclusion in a future release.

We have discussed a HIP conversion (to add AMD cards) in the past, but the oldest parts of the CUDA code use texture references, while HIP knows nothing about textures.

A pure CPU port would be easier, and useful in the long run.

We who worked on the original release have no time to do it, although we all see the benefit of being more open. I’d be happy to discuss with anybody who would like to try.

On 10 Dec 2018, at 18:05, Zeljko Vrba notifications@github.com<mailto:notifications@github.com> wrote:

@lachs0rhttps://github.com/lachs0r CUDA having "lackluster documentation"? It has more documentation than I've ever seen for OpenCL, it integrates nicely with C++ language, with Visual Studio debugger, it comes with decent performance/profiling tools... PLEASE, point me to an OpenCL implementation with as advanced tooling as CUDA. Probably Intel's implementation is a candidate, but that one is not as easy to get free (of charge) as CUDA. And then you're still stuck with low-level APIs that don't integrate nicely with C++ (I know of no free (of charge) SYCL implementation).

So if anything, it boils down to developer-friendliness.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/alicevision/AliceVision/issues/439#issuecomment-445891977, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACAsgvCRznIA0d2Djk97prgMh0vQtQeBks5u3pQ-gaJpZM4VIW_2.

peci1 commented 5 years ago

only DepthMap is CUDA-only. The stage can be bypassed, but it is important for quality

How can I do that? What's the result then, when there's no depthmap?

fabiencastan commented 5 years ago

@peci1 The option to by-pass the depthmap is available in the "develop" branch and will be available in the coming release. You just have to remove the DepthMap and DepthMapFilter nodes and connect the StructureFromMotion directly on the Meshing node. The density of the mesh will be much lower and so the geometry is much less detailed.

CloverAnastasia commented 5 years ago

@fabiencastan I have tried this as well, but it still doesn't work for me.

Please help. I am not sure what I did that's wrong. My pictures were fine, the lighting was good. I just don't have the GPU for it. Mine is GPU CUDA.

natowi commented 5 years ago

@CloverAnastasia I think you are using the 2018.1 build. This only works in a dev. build / in the next release.

CloverAnastasia commented 5 years ago

@CloverAnastasia I think you are using the 2018.1 build. This only works in a dev. build / in the next release.

@natowi If that's the case, any other solution I can try? Or I should just switch laptop for this?

natowi commented 5 years ago

@CloverAnastasia wait for the next release / compile the dev. version or use a computer with CUDA support.

ShalokShalom commented 5 years ago

Can we conclude for a second which of the options shown until have which impact on development and usability?

Vuda let us reuse code and is build on a supported platform. It is available on all platforms and different GPUs.

Any issues?

BrokenMatrix commented 5 years ago

I am going to go ahead and try porting the depth estimation stage to a CPU implementation, when I'm done I will upload the code in a .zip and maybe it will help some people here. I will not be doing any optimizations so it will likely be pretty slow, but waiting should be better than not having the depth maps at all.

TashaSkyUp commented 5 years ago

Can we conclude for a second which of the options shown until have which impact on development and usability?

Vuda let us reuse code and is build on a supported platform. It is available on all plattforms and different GPUs.

Any issues?

@griwodz Will this work?

beta-tester commented 5 years ago

a CPU-only version is not in the queue anymore? or will vuda work on a GPU on intel CPU systems (without an extra graphic card)? i never saw a vulkan or cuda (vuda) driver for that GPU on intel CPU.

BrokenMatrix commented 5 years ago

So I tried to do a port, however I get the red status after computing with no explanation (I've found the logging in this to be very hard to deal with all together) and the images in the cache are blank. As I can not run the original I have to real way to debug this (I would need output to compare it to in order to find the source). I did get an exception about dividing by zero yesterday but the debugger refused to work so I couldn't find the source and I can not reproduce the error today using the same data set. Not sure why this would be because I don't recall seeing any random numbers being chosen so in theory it should use the same data every time but I guess not. I have had problems with other unmodified nodes producing different output also so it's not just the DepthMap stage.

EDIT: The reason I will not be doing in-depth debugging at every step to see if I can find the problem that was is that it takes 10+ minutes just to load the images for every time which would make it take far too long as I would be running it many many times and I don't have weeks to spend on this.

TL;DR I tried to do a port, but unfortunately it doesn't seem to be working and I have no way to debug it as I can't run the original so I'm out.

natowi commented 5 years ago

@BrokenMatrix Thank you for the effort! Can you share your ported code here on Github? Maybe someone else will pick up the project or can fix the problem.

EmteZogaf commented 5 years ago

@beta-tester Some of the newer Intel IGP chipsets support Vulkan https://www.intel.com/content/www/us/en/support/articles/000005524/graphics-drivers.html

beta-tester commented 5 years ago

@EmteZogaf thank you, but i have an old one "Intel® HD Graphics 4000" and there is Vulkan no available. it makes no sense to buy a new graphic card or a new CPU and with new computer main board only to be able to use specific software that is uses rarely... even when the specific software is cool and offers big benefits.

mia-0 commented 5 years ago

These older Intel GPUs do have Vulkan support on Linux FWIW. Intel’s Windows drivers are pretty bad in general.

EmteZogaf commented 5 years ago

That seems correct looking at the Wiki page: https://en.wikipedia.org/wiki/Intel_Graphics_Technology#Capabilities

BrokenMatrix commented 5 years ago

@natowi Alright, I suspect the problem is that I missed an array copy and it starts processing empty data at some point. Based on the fact that I was able to trigger an integer div by zero error but not reproduce it proves that it is operating on real data at some point (or at least something that is variable).

Some notes for anyone that wants to have a go at it:

There may be a few memory leaks, I'm not entirely sure if I deleted all of the arrays I created, so if you notice that you don't see delete[] being called then chances are I forgot to put it there.
Everything is under the same function names, but I replaced the PlaneSweepingCuda class with PlaneSweepingCpu. All functions are meant to be the same, except for a few that returned gpu memory etc. (I modified them to return void and removed the code that used them)
The only class outside of the depthMap source folder that was modified was the aliceVision_depthMapEstimation.cpp file (I may have gotten the name wrong, but it's the one that compiles to the .exe for depth mapping) to remove the line of code that stopped the program if cuda wasn't found.
I'm pretty sure the only other files that were changed are the cmake files.
I removed many of the useless array copying but many still remain (I planned to remove this when I verified that it was working in the first place, but that didn't happen) What I mean by this is the places that will create a new array and copy data but never change it (or do change it but don't need to access the old data and new data separately), so they could just use the original and save time.
The old code is in a another folder which is left out of the build, so the CUDA code does not run under any circumstances here (although it wouldn't be too hard to make it work, easiest way would probably to be to have two builds, and have a cmake option for it but have near zero cmake experience so I didn't do this myself and don't plan to)
Finally, to run this you will need to use the Meshroom development code (current binaries call older version of all of the code) EDIT:
There may be a few classes near the top that I copied but didn't end up using.

Without further adieu, here is the code and good luck crawling through all 4K lines (almost exactly copy of my personal folder, but missing the build data as that was 4gigs and can be re-obtained easily) https://www.dropbox.com/s/ugyo7ic79sv89wt/Alice%20Vision.zip?dl=0

mia-0 commented 5 years ago

Very nice of you to work on this but please learn how to use Git if you want to contribute to software projects.

BrokenMatrix commented 5 years ago

@lachs0r I know how to use git but chose to upload this via different means considering it is in a broken state, if it were meant to be an actual contribution I would have used git.

Also if no one else takes it up I might try to debug it in a few weeks after I finish some other things by saving images for every step and see where it stops being legitimate data.

altaic commented 5 years ago

I'm going to have a look at adding Vulkan and Metal support. This should help with alicevision/meshroom#204 (macOS) and potentially even allow use on iOS which could be pretty interesting. Also, the number of systems that support Vulkan is pretty enormous due to Intel including support in loads of its processors.

As mentioned earlier in this thread, Vulkan support may be provided via https://github.com/jgbit/vuda. Then, Metal support may be added using https://github.com/KhronosGroup/MoltenVK.

alicevision / AliceVision

[Request] Remove CUDA dependency #439