DLR-RM / BlenderProc

A procedural Blender pipeline for photorealistic training image generation
GNU General Public License v3.0
2.76k stars 445 forks source link

blendtorch - domain randomization #52

Closed cheind closed 4 years ago

cheind commented 4 years ago

Hey,

I've just found your project and papers - very interesting work! We researching in a similar direction: we use Blender to massively randomize a scene in real-time and use this annotated data for training neural networks. Todo so, we've created blendtorch

https://github.com/cheind/pytorch-blender

which allows us to stream data from parallel Blender instances directly into PyTorch data pipelines. We avoid generating intermediate files, or to see the data generation process decoupled from network training, because we install a feedback channel that allows us to adapt the simulation to current training needs.

I thought I reach out to you guys in case you a research fit with your procedural generation pipeline?

MartinSmeyer commented 4 years ago

Hi @cheind , Thanks for reaching out, sounds interesting! I think for active domain randomization your idea definitely has cool usecases and we can learn from each other.

Concerning procedural scene generation, in our experience rendering is often not the only bottleneck, so I wonder if the online streaming still makes a lot of sense if you want to generate scenes involving complex physics / collision checking / sampling / loading etc. on the fly.

So far we are focussing on procedurally generating data using the photorealistic cycles renderer (instead of the faster but less realistic eevee renderer). Our focus is also to support a lot of labels, datasets and procedural options together with proper documentation.

If we decide to support eeve as well someday we could possibly collaborate. Having data independent does both have pros and cons depending on the application, I think.

cheind commented 4 years ago

Hey!

Concerning procedural scene generation, in our experience rendering is often not the only bottleneck, so I wonder if the online streaming still makes a lot of sense if you want to generate scenes involving complex physics / collision checking / sampling / loading etc. on the fly.

I guess the performance benefits diminish in that case. However, what might be interesting is the following: allow model training communicate with the simulation in order to generate more informative examples aligned with the current training progress (increase difficulty, scene constellation etc.). The interesting point in this is how to adapt the parameters of the simulation without being able to backprop through the renderer. We have persued two approaches: conjugate posterior inference and techniques used in reinforcement learning (score function gradients).

So far we are focussing on procedurally generating data using the photorealistic cycles renderer (instead of the faster but less realistic eevee renderer). Our focus is also to support a lot of labels, datasets and procedural options together with proper documentation.

I've seen your examples; very impressive! I wonder about the following (I'm sure you have considered this point as well): how much time should one invest into modelling? After all at some point it might be cheaper to collect real world samples and label them.

Best, Christoph

themasterlink commented 4 years ago

Hey @cheind!

Martin is right, your work sounds really interesting.

However, what might be interesting is the following: allow model training communicate with the simulation in order to generate more informative examples aligned with the current training progress (increase difficulty, scene constellation etc.).

This is, of course, a great idea, have you already published some work in this field, which we could reference if we ever add these modules to BlenderProc?

how much time should one invest into modeling? After all, at some point, it might be cheaper to collect real-world samples and label them.

To be honest, not too much! But the best part, there are already a lot of datasets out there, which contain nice and beautiful models of 3D scenes and objects. So by using them we completely avoid this problem of modeling any scenes or object ourselves. How do you address this problem, as there are of course domains, which have less public 3D datasets available?

One last thing about recording your own samples is always that you then have to label them yourself and that you can't change them easily. So if you have scanned a chair and created a 3D model with the textures, you can not just use it in a different scene as the lighting on this is chair is "baked" inside the texture, which also means, you can not add new objects on top of the chair. So scanning objects/scenes really does not add a lot to the possible randomization you can do with blender or more concrete BlenderProc.

Best, Max

cheind commented 4 years ago

Hey @themasterlink!

This is, of course, a great idea, have you already published some work in this field, which we could reference if we ever add these modules to BlenderProc?

Yes we have a bit of it here https://arxiv.org/abs/1907.01879 We used conjugate priors to facilitate posterior inference of the probabilistic models that govern random scene configuration aspects. We found conjugate priors to be quite limiting the expressive power of probabilistic models, so we are researching towards a more general solution that would allow arbitrary probability distributions to be combined.

Our journey currently leads us towards aspects of reinforcement learning, from which we borrow gradient estimators that do not require the rendering function to be differentiable. A bit of that theory is already implemented here https://github.com/cheind/pytorch-blender/tree/feature/guided_dr/examples/guideddr I am in the midst of adding the mathematical background in a separate latex document.

One last thing about recording your own samples is always that you then have to label them yourself and that you can't change them easily. So if you have scanned a chair and created a 3D model with the textures, you can not just use it in a different scene as the lighting on this is chair is "baked" inside the texture, which also means, you can not add new objects on top of the chair. So scanning objects/scenes really does not add a lot to the possible randomization you can do with blender or more concrete BlenderProc.

I couldn't agree more. Yet I think that in order to justify the use Blender to generate training data for various domains, we need to consider/reduce the time it takes to generate domain specific training data. In my opinion, the less we need to care about modelling the details of a domain, the faster we can get our supervised models to work.

Best, Christoph

themasterlink commented 4 years ago

Hey,

Yes we have a bit of it here https://arxiv.org/abs/1907.01879

This looks really interesting, I am looking forward to this reinforcement learning to avoid having differentiable renders.

we need to consider/reduce the time it takes to generate domain specific training data

I totally agree, that is one reasons why we are already supporting so many different datasets, so that people can easily integrate the few objects they then actually need in a Front-3D scene, which then for example helps during the training of a detector.

PS: I will close this issue then ;)

Best; Max