Closed czw0078 closed 6 years ago
I will try to run your code today or tomorrow and see if I can find a quick fix.
In the meantime I have some questions for you that will help identifying the issue:
1) Does this problem happen every time you run your code or only some times? 2) Do you get NaN values after rendering or when loading the .obj file? 3) What GPU are you using?
We need to isolate where the problem might be. Since you are using Python 2, I assume that you are using the latest version of the code on GitHub.
@czw0078 I located where the issue is. In an obj file, the vertex texture coordinates (vt) should be between 0 and 1. However in your .obj file some vertex texture coordinates are negative. This is responsible for accessing out of bounds memory addresses. You can take a look here How did you generate your .obj files?
I will add some additional sanity checks when loading the obj file to prevent such issues.
Dear Nikos Kolotours:
Nice to have your reply! Thank you for confirming my doubt on the quality of the 3d model!
I will add some additional sanity checks when loading the obj file to prevent such issues.
Yes, but I found that it is quite common that the UV coordinates (vt) come out of 0 and 1 range. See this wrapping section. 3D renderer very often by default uses GL_REPEAT option.
Maybe no need to do a sanity check, instead, do "mod 1" to repeat textures? From my experience, GL_REPEAT is the de facto behavior of texture mapping of most 3d renderers, and many models are built to take advantage of that. So "mod 1" will be very useful I guess.
It is easy to implement GL_REPEAT and GL_CLAMP_TO_EDGE. So my proposal will be to just print a warning in case the u-v coordinates are outside of the expected range. Right now the code only handles values > 1 that's why there is an issue with negative values. I will add an option for handling these cases that will default to GL_REPEAT but can be also changed to GL_CLAMP_TO_EDGE. Do you agree with that?
edit: I can also implement the other 2, if you think that they will be useful
edit: I can also implement the other 2 if you think that they will be useful
Wow, that is awesome! Add a GL_REPEAT/GL_CLAMP_TO_EDGE option is definitely very helpful! If it is easy for you to implement all four options, that is even awesome! I totally agree with your proposal.
Another very useful feature I would like to mention is the transparent texture mapping. In 3d modeling, we sometimes use png picture as the texture for a window. PNG picture has a transparent alpha channel, and it is the easiest way to handle glass. Do you think it is easy to let the renderer texture mapping also support the alpha channel? I guess the alpha channel can be implemented as "mask" value, but it may also involve some knowledge like Z-buffer.
I am not sure how challenging it is, do you think we can open an issue on GitHub on that feature?
You can submit a separate issue for that and add the label enhancement.
Some ideas/pointers on how to do GL_REPEAT? Is that easy to implement?
I will take care of it in the weekend. I have started working on all the current issues.
I've pushed some changes that will probably resolve this issue in the branch texture_sampling. Can you test those changes to see if they produce the desired result?
Sure! I test it right now. Thank you so much.
Did you test it?
I will finish testing it this weekend. Apologies for the delay. But it seems that the ambient lights have been broken. It looks dark on sides. Let me double check between the master branch and the texture_sample branch to make sure what causes this problem.
OK,
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524577523076/work/aten/src/THC/generated/../THCReduceAll.cuh line=339 error=77 : an illegal memory access was encountered Traceback (most recent call last): File "test_load_obj_repeat.py", line 10, in
load_texture=True, texture_size=4) File "build/bdist.linux-x86_64/egg/neural_renderer/load_obj.py", line 157, in load_obj RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1524577523076/work/aten/src/THC/generated/../THCReduceAll.cuh:339
Not sure why this time cannot be even loaded. Is it code problem or actually the .obj problem?
For the later(.obj problem), I also see this post #12 and realized that the 3d models' file format itself is a big topic. (I moved my following comments to post #12 )
The illegal memory access issue is indeed a bug. I thought that I've fixed it with this update, but apparently I need to take a look again. I am pretty sure that it means that we have an out of bounds access in the texture image. Even if your texture coordinates are not in the standard range you shouldn't see this error, because I do the texture wrapping.
I just pushed a fixed in the texture_sampling branch. Can you check if it works now?
Sure, let me pull and try it.
Unfortunately, it gives me the same 77 error.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524577523076/work/aten/src/THC/generated/../THCReduceAll.cuh line=339 error=77 : an illegal memory access was encountered Traceback (most recent call last): File "test_load_obj_repeat.py", line 10, in
load_texture=True, texture_size=4) File "build/bdist.linux-x86_64/egg/neural_renderer/load_obj.py", line 157, in load_obj RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1524577523076/work/aten/src/THC/generated/../THCReduceAll.cuh:339
But I had a good news, one of my previous models with two lines of "vt" values -0.0004 can be loaded successfully after your 1bbba08 commit. So yes, you definitely made GL_REPEAT works to some extent, though we still have the illegal access problem.
* a5d870c (HEAD -> master, nkolot_repo/texture_sampling) fixed compilation issues * b067de5 fixed out of bounds accessing bug * 1bbba08 added texture wrapping
Ok, so one thing I need from you is to send me the script you are using right now to load the IKEA model, and I will try to find where exactly the errors are and make sure that everything runs properly.
http://auburn.edu/~czw0078/issue_07_26/test_load_texture1.py
just this script
I fixed the out of bounds accessing issues, but I see again inconsistency between different implementations. I don't know why the output is not the same. Do you have any clue?
Awesome job!
I think the inconsistency between Mac preview and MeshLab is just fine. Preview often has wrong calculations of the normal vector of faces. The face of the bed should face up but somehow it calculated it as face-down, therefore, it leaves the empty black hole. But neural renderer does not need to read normal vectors from outside. If no norm vector to read from outsides, it just treats each face two sides and calculates norm by itself( if my understanding of the source code is correct ), so I guess it is fine and we can ignore that for now.
The inconsistency between neural renderer and others (Mac preview/MeshLab) are due to the data structure Kato used (Perhaps hard to fix), which is not flexible enough to work for all situations of 3d models. Take this obj file as an example, it only uses two triangles to represent the large surface of the bed, and texture_size = 4 by default may not big enough. In that case, the surface will be blurry, so it is hard to tell whether the GL_REPEAT works or not.
So there are three options we can do:
(a) increase the texture_size: from default =4 to something =8 or other values. This takes a lot of GPU resources. (b) re-do the artwork: open 3d editing software (for example, Blender) to subdivide the large face into several smaller faces while maintaining the UV map of textures, then, even the default =4 setting can generate sharp textures. (c) made the software smart to handle (b) automatically by itself.
Let us put (c) aside (at least for now). (c) involves low-level coding and it is hard to change. Many mature 3d software packages can do (c) automatically, but since the data structure of neural render is not the standard 3d graphics data structure, it may very hard to apply textbook/mature 3d technology to this project to improve it. (Actually, I still do not understand the textures tensors and how the Cuda c renderer/loading code works, I really hope someone can explain that a little bit to me :)
So I will test (a) and (b), to first try to get a sharp texture, then we check whether it has "repeat" feature as we expected or not. If it is repeated texture, then we are done here at least close GL_REPEAT issue and can leave the problem of blurry to solve later.
In short, let me pull and try (a) and (b) first. I will tell you results after I try them.
By the way, you did a really really awesome job! (Wish I can code Cuda professionally like you.)
It works! We can move on now.
(a) I increased the texture size from 4 to 8, see the repetitive pattern on the side of the bed:
(b) I also subdivided the up surface and one side surface. Before the subdivision, we have this one large face mapped into this one texture figure Note that from the UV map you can clearly see the coordinates of vertexes are out of range 0~1.0. Now we subdivided the face into smaller pieces Those selected faces are mapped to texture figure as below: Note we subdivided the surface while maintaining the relative position of the whole surface.
The upside of the bed and one side-surface show the correct repetitive pattern! (The other faces are not subdivided, so they look blurry).
So the (1) illegal memory access problem and (2) out of range 0.0 ~ 1.0 is finished. We can move on.
I think we can close this issue here now, and move on to try to solve #11 first.
I am very grateful that you ported the neural mesh renderer by Hiroharu Kato from Chainer to PyTorch. Thanks a lot for your great job!
I tried this tests_load_obj.py and it works well, so I take it as a reference. I write texture_load_texture1.py in order to load textured models from dataset Pix3D in my research project. However, when I load some models and run several times (take one example, IKEA model IKEA_LEIRVIK), it often raises an error:
ValueError: Images of type float must be between -1 and 1.
I found the output Numpy array contains NaN values by debugging, so I try to work around this issue by adding code in test_load_textured2.py, it produces this result:
model_pkr_out2.png and enlarger.png
You can see from the enlarged image that the values of some pixels on the edge of the object are off. I even tried this on the original Chainer version, and I have this error message:
cupy.cuda.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Do you have any idea on this issue? What reason do you think cause this problem (obj model quality problem or bugs in the source code)? Is it possible we could work together to fix it?
Best regards,
Chenfei Wang