Closed WangYu0611 closed 2 months ago
Yes, you need to finish pretraining and data partitioning first, i.e.
# train coarse global gaussian model
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python train_large.py --config config/$COARSE_CONFIG.yaml
# train CityGaussian
# obtain data partitioning
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python data_partition.py --config config/$CONFIG.yaml
The command is included in scripts/run_citygs.sh. This process takes around 1 hour.
I checked the code and it contains these two pieces of code, but I don't know why this step seems to be skipped. @DekuLiuTesla
Perhaps no available GPU is detected? You can try the latest script. Or just comment out following code and make sure these two pieces of code successfully run.
I comented out the rest and ran only this part. The process gets killed, still the same error. I have 40 GB available RAM, 20 GB available RTX 4090 GPU. The error is:
GPU 0 is available.
Optimizing
Output folder: ./output/building_coarse [13/08 14:51:19]
Reading camera 1920/1920 [13/08 14:51:22]
Train cameras: 1920, Test cameras: 0 [13/08 14:51:22]
Number of points at initialisation : 1603125 [13/08 14:51:22]
#2527628 dataloader seed to 42 [13/08 14:51:22]
Training progress: 0%| | 0/30000 [00:00<?, ?it/s]
scripts/run_citygs.sh: line 20: **2527628 process killed** CUDA_VISIBLE_DEVICES=$gpu_id python train_large.py --config config/$COARSE_CONFIG.yaml
GPU 0 is available.
Output folder: ./output/building_c20_r4 [13/08 14:51:45]
Reading camera 1920/1920 [13/08 14:51:48]
Train cameras: 1920, Test cameras: 0 [13/08 14:51:48]
Traceback (most recent call last):
File "/home/pc_5053/CityGaussian/data_partition.py", line 151, in <module>
scene = LargeScene(lp, gaussians, shuffle=False)
File "/home/pc_5053/CityGaussian/scene/__init__.py", line 168, in __init__
self.gaussians.load_ply(os.path.join(self.pretrain_path, "point_cloud.ply"))
File "/home/pc_5053/CityGaussian/scene/gaussian_model.py", line 229, in load_ply
plydata = PlyData.read(path)
File "/home/pc_5053/anaconda3/envs/citygs/lib/python3.9/site-packages/plyfile.py", line 401, in read
(must_close, stream) = _open_stream(stream, 'read')
File "/home/pc_5053/anaconda3/envs/citygs/lib/python3.9/site-packages/plyfile.py", line 481, in _open_stream
return (True, open(stream, read_or_write[0] + 'b'))
FileNotFoundError: [Errno 2] No such file or directory: 'output/building_coarse/point_cloud/iteration_30000/point_cloud.ply'
Hi, if the coarse global gaussian model training part run successfully, the output can be like:
GPU 5 is available.
Optimizing
Output folder: ./output/rubble_coarse [14/08 13:53:28]
Reading camera 1657/1657 [14/08 13:53:58]
Train cameras: 1657, Test cameras: 0 [14/08 13:53:58]
Number of points at initialisation : 1694315 [14/08 13:54:00]
#68863 dataloader seed to 42 [14/08 13:54:03]
#68863 caching images (1st: 401): 100%|█████████████████████████████████████████████████████████| 1024/1024 [01:00<00:00, 16.91it/s]
#68863 caching images (1st: 496): 100%|███████████████████████████████████████████████████████████| 633/633 [00:49<00:00, 12.81it/s]
Training progress: 5%|██▍ | 1360/30000 [02:56<21:20, 22.37it/s, Loss=0.1686244]
It seems that your CacheDataloader didn't successfully run. You can check if there is any bug.
I seemed to be almost done with the calculation, but it failed in the end. I used the sky data in samll_city for the calculation. I modified the code of run_citys.py, mainly modifying some paths as follows:
get_available_gpu() {
local mem_threshold=500
nvidia-smi --query-gpu=index,memory.used --format=csv,noheader,nounits | awk -v threshold="$mem_threshold" -F', ' '
$2 < threshold { print $1; exit }
'
}
TEST_PATH="data/matrix_city/aerial/test"
COARSE_CONFIG="mc_aerial_coarse"
CONFIG="mc_aerial_c36"
out_name="val_4" # 4 denotes resolution
max_block_id=8
port=4041
# train coarse global gaussian model
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python train_large.py --config config/$COARSE_CONFIG.yaml
# train CityGaussian
# obtain data partitioning
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python data_partition.py --config config/$CONFIG.yaml
# optimize each block, please adjust block number according to config
for num in $(seq 0 $max_block_id); do
while true; do
gpu_id=$(get_available_gpu)
if [[ -n $gpu_id ]]; then
echo "GPU $gpu_id is available. Starting training block '$num'"
CUDA_VISIBLE_DEVICES=$gpu_id WANDB_MODE=offline python train_large.py --config config/$CONFIG.yaml --block_id $num --port $port &
# Increment the port number for the next run
((port++))
# Allow some time for the process to initialize and potentially use GPU memory
sleep 120
break
else
echo "No GPU available at the moment. Retrying in 2 minute."
sleep 120
fi
done
done
wait
# merge the blocks
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python merge.py --config config/$CONFIG.yaml
# rendering and evaluation, add --load_vq in rendering if you want to load compressed model
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python render_large.py --config config/$CONFIG.yaml --custom_test $TEST_PATH
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python metrics_large.py -m output/$CONFIG -t $out_name
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/server9/CityGaussian/merge.py", line 106, in
`
2.I am not sure about the out_name parameter, I did not modify it. Then at the end of the calculation, the terminal reported an error:
`GPU 0 is available.
Traceback (most recent call last):
File "/home/server9/CityGaussian/render_large.py", line 139, in
Scene: output/mc_aerial_c36
Traceback (most recent call last):
File "/home/server9/CityGaussian/metrics_large.py", line 118, in
@WangYu0611 Thanks for your feedback.
out_name
setting. I think it should be test
in your case.Yes, for question 1, I checked the configuration file and it shows block_dim: [6, 6, 1], but I don’t quite understand what this means. What should I set for the SMALL_CITY aerial dataset?
是的,对于问题1,我检查了配置文件,它显示block_dim:[6, 6, 1],但我不太明白这是什么意思。我应该为SMALL_CITY航空数据集设置什么?
Here I see in the configuration file, 6X6X1, thank you
Yes, for question 1, I checked the configuration file and it shows block_dim: [6, 6, 1], but I don’t quite understand what this means. What should I set for the SMALL_CITY aerial dataset?
You should set the max_block_id
to 35, which means 6 6 1 - 1.
是的,对于问题1,我检查了配置文件,它显示block_dim:[6, 6, 1],但我不太明白这是什么意思。我应该为SMALL_CITY航空数据集设置什么?
您应该将其设置
max_block_id
为 35,这意味着 6 6 1 - 1。
yes, i do it , tq
Yes, for question 1, I checked the configuration file and it shows block_dim: [6, 6, 1], but I don’t quite understand what this means. What should I set for the SMALL_CITY aerial dataset?
We have uploaded the custom dataset instructions. You can refer to it for more details. We have also removed confusing resolution postfix for easier usage.
是的,对于问题1,我检查了配置文件,它显示block_dim:[6, 6, 1],但我不太明白这是什么意思。我应该为SMALL_CITY航空数据集设置什么?
我们已上传自定义数据集说明。您可以参考它了解更多详细信息。我们还删除了令人困惑的解析后缀,以便于使用。
thank u very much!!
I used this command but got an error: bash scripts/run_citygs.sh,
I checked the output folder and there is no such file. I don't know if I am missing any steps.cause i dont have ply and npy