Closed zawlin closed 4 years ago
Hi, thanks for reaching out!
If you have any more questions or need me to clarify anything else, let me know!
Can you check the shared folder again? I uploaded a sample result at iteration 1.
I know that my object is at origin. the cameras are all looking at origin as well. pose.txt is set to identity
From the look of it it seems the scale is a little wrong? Can I assume that my camera conventions are correct and that only scale needed to be changed
Yes it appears that the camera configuration is roughly correct. It's good that you can see the initial volume in all viewpoints. You might try increasing the world scale to get the volume to occupy the entire object.
If I increase the world scale, it seems to just grow at the corner until it occupies the upper left quadrant. I think something is still not quite right. If everything is working, I expect the initial volume to be exactly in the center since I know exactly where the camera is supposed to be looking (0,0,0) and the pose.txt applys no transformation. Do you know what might be an issue?
Check line 65 and 66 of data/dryice1.py, it divides the focal length and principal point by 4 because the training data is downsampled from the original resolution. You probably don't want that.
yes that seems to do the trick. can you check the shared folder again? does it look like I need to adjust world scale or it's just fine?
Looks pretty good to me, you'll probably have a better idea once it starts training.
Thanks for helping!
Hi, can you please check the share folder again? I uploaded ground truth and rendered results.
fixedcammean
parameters matter? I think that was just doing zero meaning based on 255 range right? Hi,
It's working very well on synthetic data. However, I have some trouble getting it working on real data. I am using microsoft's fvv paper's data and some data we captured ourselves. Basically, after a while, the training just output background image. I have manually adjusted through trial and error pose.txt
so that the volume is visible in all cameras and set world scale to 1/2, so that i don't have to spend too much time tweaking. At world scale 1, the volume is cut off in some cameras, but at world scale 1/2, it looks fine? Can you take a look at the progress images under real_data?
It shouldn't be very important that the object is exactly centered or not. We only used 34 cameras in the experiments in the paper.
If I had to guess based on the progress images, I would say that it looks like the camera parameters may not be set up correctly. If you look at the first progress image for the lincoln example prog_000003.jpg, the last row shows 4 views located behind the person but the rendered volume looks drastically different for each of them. I would expect it to be more similar if the camera parameters are correct.
If you're sure the camera parameters are correct and in the right format, one thing you can try is to try to train a model without the warp field, as sometimes it can cause stability problems in some cases.
Hmm, the same parameters were used for training scene representation net, as well as my implementation of visual hull, so I feel the cameras might be alright. And the conversion code from my format to nv format is applied similarly to previous synthetic data.
But I am not 100% certain about pose transformations. How exactly did you obtain those numbers for your dataset?
For disabling the warp field, is it enough to set self.warp
to None in
Decoder class?
On Wed, Feb 26, 2020 at 12:23 AM Stephen Lombardi notifications@github.com wrote:
It shouldn't be very important that the object is exactly centered or not. We only used 34 cameras in the experiments in the paper.
If I had to guess based on the progress images, I would say that it looks like the camera parameters may not be set up correctly. If you look at the first progress image for the lincoln example prog_000003.jpg, the last row shows 4 views located behind the person but the rendered volume looks drastically different for each of them. I would expect it to be more similar if the camera parameters are correct.
If you're sure the camera parameters are correct and in the right format, one thing you can try is to try to train a model without the warp field, as sometimes it can cause stability problems in some cases.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/neuralvolumes/issues/1?email_source=notifications&email_token=AALMFRQ3BTLOTB3HFBJH4Q3REVAY7A5CNFSM4KGBPNXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM4TSPY#issuecomment-590952767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALMFRSYPKISCM2YRGRV3KTREVAY7ANCNFSM4KGBPNXA .
-- Zaw Lin
Sorry, the pose transformation is a little cryptic so I'll try to explain better here. The way the code works is that it assume the volume always lives in the cube that spans -1 to 1 on each axis. This is what I'll call 'normalized space' since it's centered and has a standard size. When you provide the camera parameters of your rig, the camera extrinsics are in some arbitrary space that I'll refer to as 'camera space'. Because camera space has an arbitrary origin and scale, the object that you want to model won't necessarily fall in the [-1, 1]^3 volume. The pose transformation and world scale are how the code accounts for the difference between these two coordinate systems.
The transformation found in pose.txt transforms points from the normalized space to the camera space. The matrix is stored as a 3x4 matrix where the last column is the translation, which means that the translation column corresponds to the desired center of the volume (which should be the center of your object) in camera space. You can also adjust the rotation portion of the matrix to change the axes but getting the translation right is the most important bit so that the volume is placed correctly in space. Please let me know if that's helpful.
To disable the warp field you can add a parameter warptype=None to the Decoder constructor on line 33 of config.py.
Ok. Got it. I will double check my camera parameters and and try without warping field for the next few days.
Worse come to worse, would you be able to take a look at the data and check on your side? I can share the original data and scripts to convert into neural volume format, including dataloaders and experiment config files for nv.
Sure I can take a look
I managed to get it to start doing something on lincoln sequence. It turns out camera parameters were correct but the pose transformations were wrong. I was only looking at 16 cameras to do the adjustment which turns out the volume was not really overlapping the object in all cameras.
I uploaded new progress images under the same folder and also a zipped folder named lincoln.tar. Can you take a look and see if it looks like it's going well and I only need to wait?
Edit: After waiting one night, seems alright although it trains slower than synthetic data, iteration-wise. Again, thanks for all the clarifications!
I'm guessing you'll have some artifacts in the result given how it's trying to reconstruct so much of the background. I'm a little surprised since it should be easy for it to figure out that that area should be transparent, although sometimes it can get stuck in bad situations early on and it can have trouble recovering. I would recommend rendering a video of the current result with the render.py script to check it's not doing something too crazy.
Hmm..something crazy is indeed happening :( I zipped up the entire folder with data and experiments and sent you a link with via email. I have also uploaded reconstruction from another method under the given test trajectories so that you know what "ground truth" is supposed to look like.
I am also unable to get it working on the other dataset. Whenever it looks like it's gonna do something, alphapr suddenly goes to zero and kldiv starts to increase alot and then i will just get background, then it will repeat the process in a loop. I am checking if I can share this data. Do you think if I just share one frame, it should be sufficient to debug?
Since it looks like 3d volume is rotating fine, I guess camera parameters are ok? But based on the test video(and comparison with our result video), maybe the volume is clipping the object since the rendered result look like it's shifted down about half?
Sharing one frame to debug should work. I will take a look at the lincoln data and see if I can figure out what's happening.
I got the lincoln example working, attached are the dataset class, config file, and modified pose.txt (although I didn't change pose.txt much). Let me know if this works for you experiment1.zip
I got it working as well. Thanks a lot! Looks like I forgot to rescale the intrinsics. I believe it should work for the other dataset as well.
Edit: Yup it's working for both datasets.
I have one more question. In the figure where you showed latent code interpolation, do you used all the frames in the training data? Say you have frame 1-5 in training data, and during testing, you used encoder to get frame 1 and frame 5's latent code and interpolate them to get frame 2,3,4?
I'm a little confused by your question, in your example if we interpolate the encodings of frame 1 and frame 5 we won't exactly reproduce the frames between them. This is particularly true if we interpolate distant frames in the sequence, which is the case for Fig. 8 in the neural volumes paper.
Sorry for being unclear, I was trying to do "slowmo" effect and by subsampling training frames and trying to render frames which are in between(but not in training), I am trying to see if neural volume encoding produce anything reasonable in terms of time.
But I did a bit more tests and found that I can't really do slowmo effect on neural volume encodings. I am not sure if what I am doing is correct. Can you double check the result? I have the code to do the encoding interpolation and the result on full frames(training use all frames) and slowed frames(render more frames than in the trainining data). This is just to double confirm that I am doing the right thing. I think this result is sort of expected as there's no constraint on the latent space.
I took a look at the result and I think what you're seeing is expected. It's partly a limitation of this model which uses an inverse warp to model motion rather than a forward warp, which makes some motion interpolation difficult. It is also somewhat dependent on the data. I've noticed that if I train a very long sequence it does a much better job interpolating the latent space than a short sequence.
How long is long? I can try to capture longer sequences and check.
We've captured ~7500 frames of facial data and found it works pretty well with that, the data is very redundant though, which I think helps. I think this model has a harder time with bodies since they have more complex motion.
Hi, Can you guys please also share the KRT file for lincoln data?? I am still confused to set it correctly for my own data.. Any hint or reference for the format in KRT is appreciated. Thanks.
KRT.txt The KRT file is a series of camera specifications, each camera is specified in the following way: [camera name] K00 K01 K02 K10 K11 K12 K20 K21 K22 D0 D1 D2 D3 D4 R00 R01 R02 T0 R10 R11 R12 T1 R20 R21 R22 T2 [blank line]
where K is the intrinsic matrix, D are the distortion coefficients, R is the rotation matrix, T is the translation. However, you don't need to write a KRT file at all, you can simply write a new dataset class by making a copy of dryice1.py and loading the camera data however you like.
@stephenlombardi Thanks for sharing this wonderful work, after reading the above discussion, I still have some problems with how to train with my own datasets. So the first step is to get the KRT.txt and pose.txt for own datasets, the KRT.txt contains the intrinsic and extrinsic matrix which I can use some tools like colmap to get, but how to get pose.txt?
@stephenlombardi Thanks for sharing this wonderful work, after reading the above discussion, I still have some problems with how to train with my own datasets. So the first step is to get the KRT.txt and pose.txt for own datasets, the KRT.txt contains the intrinsic and extrinsic matrix which I can use some tools like colmap to get, but how to get pose.txt?
@visonpon Have you figured it out?
This comment explains pose.txt: https://github.com/facebookresearch/neuralvolumes/issues/1#issuecomment-591602762
Hi,
I have a few questions on how the data should be formatted and the data format of the provided dryice1.
world_scale=1/256.
in config.py file?pose.txt
file?config.py
file it seems you are skipping some frames? This is ok to do for my own sequence as well?