Load Textures With UV Coordinates Out Of Range 0.0~1.0

czw0078 commented 6 years ago

I am very grateful that you ported the neural mesh renderer by Hiroharu Kato from Chainer to PyTorch. Thanks a lot for your great job!

I tried this tests_load_obj.py and it works well, so I take it as a reference. I write texture_load_texture1.py in order to load textured models from dataset Pix3D in my research project. However, when I load some models and run several times (take one example, IKEA model IKEA_LEIRVIK), it often raises an error:

ValueError: Images of type float must be between -1 and 1.

I found the output Numpy array contains NaN values by debugging, so I try to work around this issue by adding code in test_load_textured2.py, it produces this result:

model_pkr_out2.png and enlarger.png

You can see from the enlarged image that the values of some pixels on the edge of the object are off. I even tried this on the original Chainer version, and I have this error message:

cupy.cuda.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered

Do you have any idea on this issue? What reason do you think cause this problem (obj model quality problem or bugs in the source code)? Is it possible we could work together to fix it?

Best regards,

Chenfei Wang

nkolot commented 6 years ago

I will try to run your code today or tomorrow and see if I can find a quick fix.

In the meantime I have some questions for you that will help identifying the issue:

1) Does this problem happen every time you run your code or only some times? 2) Do you get NaN values after rendering or when loading the .obj file? 3) What GPU are you using?

We need to isolate where the problem might be. Since you are using Python 2, I assume that you are using the latest version of the code on GitHub.

nkolot commented 6 years ago

@czw0078 I located where the issue is. In an obj file, the vertex texture coordinates (vt) should be between 0 and 1. However in your .obj file some vertex texture coordinates are negative. This is responsible for accessing out of bounds memory addresses. You can take a look here How did you generate your .obj files?

I will add some additional sanity checks when loading the obj file to prevent such issues.

czw0078 commented 6 years ago

Dear Nikos Kolotours:

Nice to have your reply! Thank you for confirming my doubt on the quality of the 3d model!

I will add some additional sanity checks when loading the obj file to prevent such issues.

Yes, but I found that it is quite common that the UV coordinates (vt) come out of 0 and 1 range. See this wrapping section. 3D renderer very often by default uses GL_REPEAT option.

Maybe no need to do a sanity check, instead, do "mod 1" to repeat textures? From my experience, GL_REPEAT is the de facto behavior of texture mapping of most 3d renderers, and many models are built to take advantage of that. So "mod 1" will be very useful I guess.

nkolot commented 6 years ago

It is easy to implement GL_REPEAT and GL_CLAMP_TO_EDGE. So my proposal will be to just print a warning in case the u-v coordinates are outside of the expected range. Right now the code only handles values > 1 that's why there is an issue with negative values. I will add an option for handling these cases that will default to GL_REPEAT but can be also changed to GL_CLAMP_TO_EDGE. Do you agree with that?

edit: I can also implement the other 2, if you think that they will be useful

czw0078 commented 6 years ago

edit: I can also implement the other 2 if you think that they will be useful

Wow, that is awesome! Add a GL_REPEAT/GL_CLAMP_TO_EDGE option is definitely very helpful! If it is easy for you to implement all four options, that is even awesome! I totally agree with your proposal.

Another very useful feature I would like to mention is the transparent texture mapping. In 3d modeling, we sometimes use png picture as the texture for a window. PNG picture has a transparent alpha channel, and it is the easiest way to handle glass. Do you think it is easy to let the renderer texture mapping also support the alpha channel? I guess the alpha channel can be implemented as "mask" value, but it may also involve some knowledge like Z-buffer.

I am not sure how challenging it is, do you think we can open an issue on GitHub on that feature?

nkolot commented 6 years ago

You can submit a separate issue for that and add the label enhancement.

czw0078 commented 6 years ago

Some ideas/pointers on how to do GL_REPEAT? Is that easy to implement?

nkolot commented 6 years ago

I will take care of it in the weekend. I have started working on all the current issues.

nkolot commented 6 years ago

I've pushed some changes that will probably resolve this issue in the branch texture_sampling. Can you test those changes to see if they produce the desired result?

czw0078 commented 6 years ago

Sure! I test it right now. Thank you so much.

nkolot commented 6 years ago

Did you test it?

czw0078 commented 6 years ago

I will finish testing it this weekend. Apologies for the delay. But it seems that the ambient lights have been broken. It looks dark on sides. Let me double check between the master branch and the texture_sample branch to make sure what causes this problem.

czw0078 commented 6 years ago

OK,

First, I tested the code on the mini cooper comes from Kato's repo, it looks great.
Then, I tested it on the IKEA_LEIRVIK model, here is the error message:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524577523076/work/aten/src/THC/generated/../THCReduceAll.cuh line=339 error=77 : an illegal memory access was encountered Traceback (most recent call last): File "test_load_obj_repeat.py", line 10, in load_texture=True, texture_size=4) File "build/bdist.linux-x86_64/egg/neural_renderer/load_obj.py", line 157, in load_obj RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1524577523076/work/aten/src/THC/generated/../THCReduceAll.cuh:339

Not sure why this time cannot be even loaded. Is it code problem or actually the .obj problem?

For the later(.obj problem), I also see this post #12 and realized that the 3d models' file format itself is a big topic. (I moved my following comments to post #12 )

nkolot commented 6 years ago

The illegal memory access issue is indeed a bug. I thought that I've fixed it with this update, but apparently I need to take a look again. I am pretty sure that it means that we have an out of bounds access in the texture image. Even if your texture coordinates are not in the standard range you shouldn't see this error, because I do the texture wrapping.

I just pushed a fixed in the texture_sampling branch. Can you check if it works now?

czw0078 commented 6 years ago

Sure, let me pull and try it.

czw0078 commented 6 years ago

Unfortunately, it gives me the same 77 error.

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524577523076/work/aten/src/THC/generated/../THCReduceAll.cuh line=339 error=77 : an illegal memory access was encountered Traceback (most recent call last): File "test_load_obj_repeat.py", line 10, in load_texture=True, texture_size=4) File "build/bdist.linux-x86_64/egg/neural_renderer/load_obj.py", line 157, in load_obj RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1524577523076/work/aten/src/THC/generated/../THCReduceAll.cuh:339

But I had a good news, one of my previous models with two lines of "vt" values -0.0004 can be loaded successfully after your 1bbba08 commit. So yes, you definitely made GL_REPEAT works to some extent, though we still have the illegal access problem.

* a5d870c (HEAD -> master, nkolot_repo/texture_sampling) fixed compilation issues * b067de5 fixed out of bounds accessing bug * 1bbba08 added texture wrapping

nkolot commented 6 years ago

Ok, so one thing I need from you is to send me the script you are using right now to load the IKEA model, and I will try to find where exactly the errors are and make sure that everything runs properly.

czw0078 commented 6 years ago

http://auburn.edu/~czw0078/issue_07_26/test_load_texture1.py

just this script

nkolot commented 6 years ago

I fixed the out of bounds accessing issues, but I see again inconsistency between different implementations. I don't know why the output is not the same. Do you have any clue? meshlab

czw0078 commented 6 years ago

Awesome job!

I think the inconsistency between Mac preview and MeshLab is just fine. Preview often has wrong calculations of the normal vector of faces. The face of the bed should face up but somehow it calculated it as face-down, therefore, it leaves the empty black hole. But neural renderer does not need to read normal vectors from outside. If no norm vector to read from outsides, it just treats each face two sides and calculates norm by itself( if my understanding of the source code is correct ), so I guess it is fine and we can ignore that for now.
The inconsistency between neural renderer and others (Mac preview/MeshLab) are due to the data structure Kato used (Perhaps hard to fix), which is not flexible enough to work for all situations of 3d models. Take this obj file as an example, it only uses two triangles to represent the large surface of the bed, and texture_size = 4 by default may not big enough. In that case, the surface will be blurry, so it is hard to tell whether the GL_REPEAT works or not.

So there are three options we can do:

(a) increase the texture_size: from default =4 to something =8 or other values. This takes a lot of GPU resources. (b) re-do the artwork: open 3d editing software (for example, Blender) to subdivide the large face into several smaller faces while maintaining the UV map of textures, then, even the default =4 setting can generate sharp textures. (c) made the software smart to handle (b) automatically by itself.

Let us put (c) aside (at least for now). (c) involves low-level coding and it is hard to change. Many mature 3d software packages can do (c) automatically, but since the data structure of neural render is not the standard 3d graphics data structure, it may very hard to apply textbook/mature 3d technology to this project to improve it. (Actually, I still do not understand the textures tensors and how the Cuda c renderer/loading code works, I really hope someone can explain that a little bit to me :)

So I will test (a) and (b), to first try to get a sharp texture, then we check whether it has "repeat" feature as we expected or not. If it is repeated texture, then we are done here at least close GL_REPEAT issue and can leave the problem of blurry to solve later.

In short, let me pull and try (a) and (b) first. I will tell you results after I try them.

By the way, you did a really really awesome job! (Wish I can code Cuda professionally like you.)

czw0078 commented 6 years ago

It works! We can move on now.

(a) I increased the texture size from 4 to 8, see the repetitive pattern on the side of the bed:

(b) I also subdivided the up surface and one side surface. Before the subdivision, we have this one large face mapped into this one texture figure figure_4 Note that from the UV map you can clearly see the coordinates of vertexes are out of range 0~1.0. Now we subdivided the face into smaller pieces Those selected faces are mapped to texture figure as below: figure_3 Note we subdivided the surface while maintaining the relative position of the whole surface. iterm2 subdivided out3

The upside of the bed and one side-surface show the correct repetitive pattern! (The other faces are not subdivided, so they look blurry).

So the (1) illegal memory access problem and (2) out of range 0.0 ~ 1.0 is finished. We can move on.

I think we can close this issue here now, and move on to try to solve #11 first.

daniilidis-group / neural_renderer

Load Textures With UV Coordinates Out Of Range 0.0~1.0 #6