JuliaNeuralGraphics / NerfGUI.jl

MIT License
29 stars 3 forks source link

Two small issues #4

Open dfarmer opened 1 year ago

dfarmer commented 1 year ago

Hello! I pulled the latest version of NerfGUI.jl and Nerf.jl to my local machine and in NerfGUI I entered package mode and "dev'd" Nerf.jl. But it crashed for me with errors about undefined DEVICE. I prepared a PR to fix it but as I was about to push it I noticed that the issue is already fixed on the nerf-update branch. So I thought I would ask if it was time to merge that branch to main or is there a potential issue?

My 2nd question (which may be related) is that the training performance after the fix is extremely slow. On the order of a minute or two per iteration. I'm using CUDA with an RTX 4090 so I guess I was expected a few frames per second, but looking around I couldn't find a reference iteration time and even in the JuliaCon '23 talk I didn't see training speed. Is there something wrong on my end or is < 1 fps during training expected?

pxl-th commented 1 year ago

Hi! nerf-update branch was already merged in #2. By fixes you mean changes to LocalPreferences.toml: backend=CUDA? If so, then this is the value that users needs to specify manually.

Regarding the performance, you should expect it to be slower than respective instant-ngp C++ version, but more work on the performance side is in coming. Also, at the beginning, training steps are slower mostly because of the occupancy acceleration structure. On the default dataset on RTX 3060 I get ~25 seconds for 1k training steps (1024 batch size).

Additionally, you can try disabling rendering during training and vice-versa, disable training when rendering for better performance.

pxl-th commented 1 year ago

Also, I don't think you need to dev Nerf.jl for NerfGUI.jl, unless you want to modify it.

And lastly, we have a basic benchmark in Nerf.jl which you can run with:

using Nerf
Nerf.benchmark()

And report the numbers here

dfarmer commented 1 year ago

No, I was referring to https://github.com/JuliaNeuralGraphics/NerfGUI.jl/blob/main/src/NerfGUI.jl#L18 is still Nerf.DEVICE on the tip for NerfGUI.jl when it needs to be Nerf.Backend to match up with the latest Nerf.jl -- I spent some time figuring that out and making the changes locally and then saw they were in your branch. I wonder what happened with the merge you linked? I see it says it was merged, but you can see the Device --> Backend rename isn't reflected. 🤔

For the benchmark I see (Windows 11, Nvidia RTX 4090)

julia> using Nerf
[ Info: Precompiling Nerf [2c86e8b6-813a-40f3-97f9-c72f78886291]
[ Info: [Nerf.jl] Backend: CUDA
[ Info: [Nerf.jl] Device: CUDA.CUDAKernels.CUDABackend(false, false)

julia> Nerf.benchmark()
Trainer benchmark
1
2
3
4
5
6
7
8
9
10
 78.700332 seconds (31.23 M allocations: 1.953 GiB, 1.30% gc time, 24.32% compilation time: 2% of which was recompilation)

I also just wanted to say I'm sorry if my first post came off as critical. I think this project is extremely cool, I was just trying to figure out what my expectations should be (and based on your comment: 25 seconds for 1k steps and my benchmark showing 78 sec for 10 it does seem there's some kind of large discrepancy).

julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e90 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × AMD Ryzen 7 3700X 8-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 16 on 16 virtual cores
pxl-th commented 1 year ago

Nerf.benchmark() does following:

    @time trainer_benchmark(trainer, 10)
    @time trainer_benchmark(trainer, 1000)

So it runs benchmark twice, first 10 iterations to make sure everything is compiled. So 78 seconds you are seeing include compilation of kernels, you should look at what's next. Same for renderer benchmark.

As for DEVICE, indeed there was an issue, thanks for pointing out!

dfarmer commented 1 year ago

For the 1,000 iterations it took

1388.605311 seconds (2.96 M allocations: 135.260 MiB, 0.01% gc time, 0.00% compilation time)
pxl-th commented 1 year ago

Wow, that is extremely slow! Can you share a profiling result using ProfileCanvas.jl?

julia> using Nerf, ProfileCanvas

julia> config_file = joinpath(pkgdir(Nerf), "data", "raccoon_sofa2", "transforms.json");

julia> dataset = Nerf.Dataset(Nerf.Backend; config_file);

julia> model = Nerf.BasicModel(Nerf.BasicField(Nerf.Backend));

julia> trainer = Nerf.Trainer(model, dataset; n_rays=1024);

julia> @profview Nerf.trainer_benchmark(trainer, 10); # Ignore it, since it includes compilation time

julia> @profview Nerf.trainer_benchmark(trainer, 10); # Report this one.
dfarmer commented 1 year ago

nerfjl_profile.zip

(Sorry for the zip; turns out Github doesn't let you attach html files). One thing I noticed when running the 1k iterations yesterday and on the profile today is that actually there will be "bursts" where it will do 5-15 iterations in less than a second and then the next few iterations will take multiple seconds each. I'm not sure if it's the garbage collector or maybe device transfers, but it's definitely not the case that all iterations are equally slow but more that it is very irregular; sometimes quite fast and other times extremely slow.

pxl-th commented 1 year ago

There profiling results aren't very useful unfortunately... My guess is this is either due to GC or some Windows-specific stuff. I actually never ran it on Windows.

But since you are seeing bursts with fast iterations it may be due to GC. Where it blocks everything, trying to free enough GPU memory. Although I ran it on 6 GB Nvidia GPU without these issues...

dfarmer commented 1 year ago

Ok, well I guess we can close this anyway. Maybe one last question: you mentioned that you've run it on a 6 GB GPU; that was one other weird thing that I noticed -- when I run Nerf.jl it always instantly allocates all 24GB that the 4090's got and I had wondered about that also -- is that intentional or maybe that's something I can look into also (probably at the CUDA.jl level?). Thanks for the help and sorry for the chatty "issue."

pxl-th commented 1 year ago

when I run Nerf.jl it always instantly allocates all 24GB

It allocates, but it does not use all that memory. This is due to GC not freeing unused arrays immediately, which means, from the CUDA.jl perspective, that those arrays (and underlying memory) are still in use.

And when the memory pool grows to the maximum size (24 GB in your case), then it forcibly triggers GC, freeing the memory and potentially releasing it back to OS, which is expensive. I guess this is why you are seeing such a dramatic slow-down.

Ok, well I guess we can close this anyway

I propose we leave it open, since this huge drop in performance is worrying.

pxl-th commented 1 year ago

If you can, try running another Nerf.jl benchmark, it tests performance of a single kernel without allocations.

  1. Update to latest Nerf.jl.
  2. Go to Nerf.jl directory.
  3. Update project's packages with:
    • julia --threads=auto
    • ]up
  4. Run: julia --threads=auto --project=. benchmark/main.jl.

Currently it tests the performance of HashGridEncoding kernel and sperical harmonics. Thanks!