Closed jay757425789 closed 1 year ago
Can you give me a contact method, WeChat or email, thank you very much.
Hi! Glad to hear from you.
Regarding the focal
problem, we provide a script that implements the conversion of the COLMAP database to JSON format poses, and supports the common parameter extraction of different camera models. This means that in a JSON file, focal
will be divided into two parameters: fl_x
and fl_y
.
load_json_drone_data()
method is suitable for simple pinhole camera models that use single focal parameter for input data.
The easiest way to solve this problem is to change focal = meta["focal"]
in load_json_drone_data()
to focal = meta["fl_x"]
, because fl_x=fl_y
is usually true in most camera models.
Do not forget to adjust the parameter 10 / image_scale
according to the downsampling ratio of the current images.
Regarding the image rendering
problem, such image mutilation is usually due to a bounding box
setup that is too small. The bounding box
during training can be controlled by adjusting the --ub
and --lb
parameters.
To further help us resolve the issue together, you can attach the .txt
configuration file you are using
Thanks, there is my .txt dataroot = /opt/data/private/data/nerfdata/rgb datadir = images dataset_name = city expname = west_x5_tiny subfolder = [west] ndim = 1
lb = [-2.4,-2.4,-0.05] ub = [2.4,2.4,0.55]
add_nerf = 500
basedir = ./log
train_near_far = [1e-1, 4] render_near_far = [2e-1, 4] downsample_train = 1
n_iters = 50000 batch_size = 8192 render_batch_size = 16384
N_voxel_init = 2097156 # 1283 N_voxel_final = 1073741824 # 10243
upsamp_list = [2000,3000,4000,5500,7000] update_AlphaMask_list = [2000,4000]
N_vis = 5 # vis all testing images vis_every = 500
n_lamb_sigma = [16,16,16] n_lamb_sh = [48,48,48]
fea2denseAct = softplus
view_pe = 2 fea_pe = 2
L1_weight_inital = 8e-5 rm_weight_mask_thre = 1e-4
TV_weight_density = 0.1 TV_weight_app = 0.01
compute_extra_metrics = 1 run_nerf = 0 bias_enable = 1 white_bkgd = 1
Hi!
The --ub
and --lb
values provided here by default are usually too small, and you can consider increasing them by 5~10
times in further experiments
The specific value can be adjusted according to the camera poses range
calculated after loading the data
Thanks, all values in lb and ub increasing by 5-10 times? lb = [-2.4,-2.4,-0.05] ub = [2.4,2.4,0.55]
to lb = [-24,-24,-0.5] ub = [24,24,5.5]
JAY奏专属 @.***
Original
From:"Zhou yuanzhen (Bruce)"< @.*** >;
Date:2023/8/1 10:16
To:"InternLandMark/LandMark"< @.*** >;
CC:"Jialiang Tang"< @. >;"Author"< @. >;
Subject:Re: [InternLandMark/LandMark] KeyError: 'focal' (Issue #1)
Hi!
The --ub and --lb values provided here by default are usually too small, and you can consider increasing them by 5~10 times in further experiments
The specific value can be adjusted according to the camera poses range calculated after loading the data
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Yes, that's what I mean
The specific value can be adjusted according to the poses bounds
. Here is an example.
My pose information: train poses bds tensor([-2.9262, -3.6487, -2.8328]) tensor([3.1265, 2.8131, 2.8410]) test poses bds tensor([-0.4393, -3.3211, -1.2280]) tensor([-0.2740, -2.8605, -1.2055]) So, the -lb and -ub should be set as: lb = [[-2.9262, -3.6487, -2.8328] ub = [3.1265, 2.8131, 2.8410] Is correct?
In fact, the recommended size of the bounding box needs to be several times larger than the pose bound.
The exact same size does not work well in experiments.🤔
I found that the generated .mp4 is none. How to generate it.
Hi guys, I still haven't figured out how to set --lb and --ub specifically according to poses pounds. If poses pounds are as follows Is it reasonable for me to set lb = [22.02,7.50,-5140.63] ub = [571.05,209.79,-203.64] ?
Hi! I just open a new issue thread for new problems
I have a question! I think the 10 in focal = meta["focal"] * (10 / image_scale)
should be 1.0.I think that Focal length and image size change in the same proportion?
That's correct! Focal length should always be re-scaled with the image size. Here we assume that the dataset(including the focal) is already downsampled at 10 by default ,the downsample_train
in our default configs is also ‘10’ to ensure consistency. We will update a better dataloader recently to avoid ambiguity.
Awesome, thank you for your response. Regarding image rendering quality, I've been working with my own aerial dataset, and I've noticed that even after adjusting parameters such as "lb" and "ub," as well as "near" and "far," I'm able to achieve a PSNR of 20. However, the image quality hasn't improved with the increase in PSNR and still remains quite blurry. Do you have any suggestions for improvement? I'm looking forward to your response.
Did this tread help with your question?
https://github.com/InternLandMark/LandMark/issues/6
Also, recently I realized that a lot of aerial data is likely to include multiple cameras, especially with very different focal lengths. This is not currently supported by the city
dataloader
I've already gone through all the responses #6 , and while the PSNR has improved, the quality of the rendered images is still quite blurry. I believe my dataset should be highly consistent with yours since I've used a low-altitude drone for aerial photography with only a single camera throughout. After multiple rounds of parameter adjustments, I've only seen an increase in PSNR. So, what do you think could be the reasons behind this?
According to the depth map, Grid branch did not learn the correct geometry, can you share the config file?🤔
yeah,of couse! transforms_train.json config.txt
Based on this config file, I have several suggestions that might work:
1.Using relu
rather than softplus
infea2denseAct = softplus
is more suitable for this kind of dataset
2.Make the scene box enclose only the buildings on the ground as much as possible, especially the extent of the z-axis. It can be caculated based on the camera pose and flight altitude
3.Increase the value of the N_voxel and add resMode=[1,2,4]
to use multi-resolution grid
And the 'add_nerf' can be commented out before verifying that the grid branch learns the geometric features correctly. Other settings can also refer to the config of the MatrixCity dataset. Hopefully, these methods will help, as different cases always have different optimal settings
okay,I will have a try.Regarding the second suggestion, I would like to ask how it is calculated. If my drone is flying at a height of 120m above the ground, based on the pose file above, should I set the range of the z-axis to [-120,1]?
In fact, it depends on the coordinate system and scaling of the pose. 🤔For example, if the z-axis range of the camera pose is distributed around 0, and the scale of the space is consistent with the reality, that is, the ground is distributed around -120, then [-125,50] is appropriate, including a downward reservation of 5 meters and a building height of less than 50 meters.
The SH2000
coordinate system is usually used in the case we showed, in which case the ground is always distributed around 0. Hope this helps you
Thank you very much for your suggestion. Since my drone is overlooking all the buildings in the school, I think [-125,5] might be enough. Is this correct? Also I have a question if there is a scaling factor, does it have any specific impact on my range selection. If the scaling factor is 5, should I set it to [-25,1]? Looking forward to your reply
If there is a scaling factor for the camera pose, then the scene will be scaled equally. This means that the scene box also needs to be scaled proportionally, so [-25,1] is suitable
This is the result after I changed it to [-125,15], and there has been a significant improvement compared to before. Then I observed the depth map and felt that it might be inaccurate. What do you think about the accuracy of this depth map?
The image quality looks great! For the depth map, I think it is possible that the depth contrast under this view is relatively low. Views that are not perpendicular to the ground will generally have better depth maps, and you can refer to those images. 🤔
Also, I realized I had introduced a bug earlier lol. The upper boundary of the scene box should be comparable to the ground + building height, then the suitable z-axis should be [-125, -70]
to avoid overfitting into the air.
lol.[-125,75] should be more reasonable, thank you very much for your answer. But currently, if you cancel the comment of add_nerf, the result will be very high blur. What is causing this problem?
I find that the generated json without a key of focal. File "app/trainer.py", line 43, in train train_dataset, test_dataset = prep_dataset(enable_lpips, args) File "/opt/data/private/code/LandMark/app/tools/train_utils.py", line 153, in prep_dataset train_dataset = dataset( File "/opt/data/private/code/LandMark/app/tools/dataloader/city_dataset.py", line 30, in init self.read_meta() File "/opt/data/private/code/LandMark/app/tools/dataloader/city_dataset.py", line 36, in read_meta meta = load_json_drone_data( File "/opt/data/private/code/LandMark/app/tools/dataloader/ray_utils.py", line 114, in load_json_drone_data focal = meta["focal"] * (10 / image_scale) KeyError: 'focal'