Do I need to use 3dgs to calculate before using this command?

WangYu0611 commented 2 months ago

I used this command but got an error: bash scripts/run_citygs.sh，

(citygs) vizzio@vizzio-B660I-AORUS-PRO-DDR4:~/CityGaussian$ bash scripts/run_citygs.sh
Optimizing 
Output folder: ./output/rubble_coarse [12/08 13:08:32]
Reading camera 1657/1657 [12/08 13:08:35]
Train cameras: 1657, Test cameras: 0 [12/08 13:08:35]
Number of points at initialisation :  1694315 [12/08 13:08:35]
#16667 dataloader seed to 42 [12/08 13:08:36]
Training progress:   0%|                                                                                                                                                                          | 0/30000 [00:00<?, ?it/sscripts/run_citygs.sh: line 18: 16667 Killed                  CUDA_VISIBLE_DEVICES=$(get_available_gpu) python train_large.py --config config/$COARSE_CONFIG.yaml█████████████████▋       | 973/1024 [00:29<00:01, 30.30it/s]
Output folder: ./output/rubble_c9_r4 [12/08 13:09:14]
Reading camera 1657/1657 [12/08 13:09:17]
Train cameras: 1657, Test cameras: 0 [12/08 13:09:17]
Traceback (most recent call last):
  File "/home/vizzio/CityGaussian/data_partition.py", line 151, in <module>
    scene = LargeScene(lp, gaussians, shuffle=False)
  File "/home/vizzio/CityGaussian/scene/__init__.py", line 168, in __init__
    self.gaussians.load_ply(os.path.join(self.pretrain_path, "point_cloud.ply"))
  File "/home/vizzio/CityGaussian/scene/gaussian_model.py", line 229, in load_ply
    plydata = PlyData.read(path)
  File "/home/vizzio/miniconda3/envs/citygs/lib/python3.9/site-packages/plyfile.py", line 401, in read
    (must_close, stream) = _open_stream(stream, 'read')
  File "/home/vizzio/miniconda3/envs/citygs/lib/python3.9/site-packages/plyfile.py", line 481, in _open_stream
    return (True, open(stream, read_or_write[0] + 'b'))
FileNotFoundError: [Errno 2] No such file or directory: 'output/rubble_coarse/point_cloud/iteration_30000/point_cloud.ply'
GPU 0 is available. Starting training block '0'
Optimizing 
Output folder: ./output/rubble_c9_r4/cells/cell0 [12/08 13:09:20]
Traceback (most recent call last):
  File "/home/vizzio/CityGaussian/train_large.py", line 309, in <module>
    training(lp, op, pp, args.test_iterations, args.save_iterations, args.refilter_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "/home/vizzio/CityGaussian/train_large.py", line 43, in training
    scene = LargeScene(dataset, gaussians)
  File "/home/vizzio/CityGaussian/scene/__init__.py", line 123, in __init__
    partition = np.load(os.path.join(args.source_path, "data_partitions", f"{args.partition_name}.npy"))[:, args.block_id]
  File "/home/vizzio/miniconda3/envs/citygs/lib/python3.9/site-packages/numpy/lib/npyio.py", line 427, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'data/mill19/rubble-pixsfm/train/data_partitions/rubble_c9_r4.npy'
GPU 0 is available. Starting training block '1'
Optimizing 
Output folder: ./output/rubble_c9_r4/cells/cell1 [12/08 13:11:20]
Traceback (most recent call last):
  File "/home/vizzio/CityGaussian/train_large.py", line 309, in <module>
    training(lp, op, pp, args.test_iterations, args.save_iterations, args.refilter_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "/home/vizzio/CityGaussian/train_large.py", line 43, in training
    scene = LargeScene(dataset, gaussians)
  File "/home/vizzio/CityGaussian/scene/__init__.py", line 123, in __init__
    partition = np.load(os.path.join(args.source_path, "data_partitions", f"{args.partition_name}.npy"))[:, args.block_id]
  File "/home/vizzio/miniconda3/envs/citygs/lib/python3.9/site-packages/numpy/lib/npyio.py", line 427, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'data/mill19/rubble-pixsfm/train/data_partitions/rubble_c9_r4.npy'

I checked the output folder and there is no such file. I don't know if I am missing any steps.cause i dont have ply and npy

DekuLiuTesla commented 2 months ago

Yes, you need to finish pretraining and data partitioning first, i.e.

# train coarse global gaussian model
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python train_large.py --config config/$COARSE_CONFIG.yaml

# train CityGaussian
# obtain data partitioning
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python data_partition.py --config config/$CONFIG.yaml

The command is included in scripts/run_citygs.sh. This process takes around 1 hour.

WangYu0611 commented 2 months ago

I checked the code and it contains these two pieces of code, but I don't know why this step seems to be skipped. @DekuLiuTesla

DekuLiuTesla commented 2 months ago

Perhaps no available GPU is detected? You can try the latest script. Or just comment out following code and make sure these two pieces of code successfully run.

nurbanu170399 commented 2 months ago

I comented out the rest and ran only this part. The process gets killed, still the same error. I have 40 GB available RAM, 20 GB available RTX 4090 GPU. The error is:

GPU 0 is available.
Optimizing
Output folder: ./output/building_coarse [13/08 14:51:19]
Reading camera 1920/1920 [13/08 14:51:22]
Train cameras: 1920, Test cameras: 0 [13/08 14:51:22]
Number of points at initialisation :  1603125 [13/08 14:51:22]
#2527628 dataloader seed to 42 [13/08 14:51:22]
Training progress:   0%|                                                                   | 0/30000 [00:00<?, ?it/s]
scripts/run_citygs.sh: line 20: **2527628 process killed**      CUDA_VISIBLE_DEVICES=$gpu_id python train_large.py --config config/$COARSE_CONFIG.yaml
GPU 0 is available.
Output folder: ./output/building_c20_r4 [13/08 14:51:45]
Reading camera 1920/1920 [13/08 14:51:48]
Train cameras: 1920, Test cameras: 0 [13/08 14:51:48]
Traceback (most recent call last):
  File "/home/pc_5053/CityGaussian/data_partition.py", line 151, in <module>
    scene = LargeScene(lp, gaussians, shuffle=False)
  File "/home/pc_5053/CityGaussian/scene/__init__.py", line 168, in __init__
    self.gaussians.load_ply(os.path.join(self.pretrain_path, "point_cloud.ply"))
  File "/home/pc_5053/CityGaussian/scene/gaussian_model.py", line 229, in load_ply
    plydata = PlyData.read(path)
  File "/home/pc_5053/anaconda3/envs/citygs/lib/python3.9/site-packages/plyfile.py", line 401, in read
    (must_close, stream) = _open_stream(stream, 'read')
  File "/home/pc_5053/anaconda3/envs/citygs/lib/python3.9/site-packages/plyfile.py", line 481, in _open_stream
    return (True, open(stream, read_or_write[0] + 'b'))
FileNotFoundError: [Errno 2] No such file or directory: 'output/building_coarse/point_cloud/iteration_30000/point_cloud.ply'

DekuLiuTesla commented 2 months ago

Hi, if the coarse global gaussian model training part run successfully, the output can be like:

GPU 5 is available.
Optimizing 
Output folder: ./output/rubble_coarse [14/08 13:53:28]
Reading camera 1657/1657 [14/08 13:53:58]
Train cameras: 1657, Test cameras: 0 [14/08 13:53:58]
Number of points at initialisation :  1694315 [14/08 13:54:00]
#68863 dataloader seed to 42 [14/08 13:54:03]
#68863 caching images (1st: 401): 100%|█████████████████████████████████████████████████████████| 1024/1024 [01:00<00:00, 16.91it/s]
#68863 caching images (1st: 496): 100%|███████████████████████████████████████████████████████████| 633/633 [00:49<00:00, 12.81it/s]
Training progress:   5%|██▍                                                    | 1360/30000 [02:56<21:20, 22.37it/s, Loss=0.1686244]

It seems that your CacheDataloader didn't successfully run. You can check if there is any bug.

WangYu0611 commented 2 months ago

I seemed to be almost done with the calculation, but it failed in the end. I used the sky data in samll_city for the calculation. I modified the code of run_citys.py, mainly modifying some paths as follows:

get_available_gpu() {
  local mem_threshold=500
  nvidia-smi --query-gpu=index,memory.used --format=csv,noheader,nounits | awk -v threshold="$mem_threshold" -F', ' '
  $2 < threshold { print $1; exit }
  '
}

TEST_PATH="data/matrix_city/aerial/test"

COARSE_CONFIG="mc_aerial_coarse"
CONFIG="mc_aerial_c36"

out_name="val_4"  # 4 denotes resolution 
max_block_id=8
port=4041

# train coarse global gaussian model
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python train_large.py --config config/$COARSE_CONFIG.yaml

# train CityGaussian
# obtain data partitioning
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python data_partition.py --config config/$CONFIG.yaml

# optimize each block, please adjust block number according to config
for num in $(seq 0 $max_block_id); do
    while true; do
        gpu_id=$(get_available_gpu)
        if [[ -n $gpu_id ]]; then
            echo "GPU $gpu_id is available. Starting training block '$num'"
            CUDA_VISIBLE_DEVICES=$gpu_id WANDB_MODE=offline python train_large.py --config config/$CONFIG.yaml --block_id $num --port $port &
            # Increment the port number for the next run
            ((port++))
            # Allow some time for the process to initialize and potentially use GPU memory
            sleep 120
            break
        else
            echo "No GPU available at the moment. Retrying in 2 minute."
            sleep 120
        fi
    done
done
wait

# merge the blocks
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python merge.py --config config/$CONFIG.yaml

# rendering and evaluation, add --load_vq in rendering if you want to load compressed model
gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python render_large.py --config config/$CONFIG.yaml --custom_test $TEST_PATH

gpu_id=$(get_available_gpu)
echo "GPU $gpu_id is available."
CUDA_VISIBLE_DEVICES=$gpu_id python metrics_large.py -m output/$CONFIG -t $out_name

The calculation ended with an error, and the folder could not be found: xxxxx, I checked the directory and found only 0-8, but no cell9,： `Merged 760429 points from block 8 from iteration 30000. Traceback (most recent call last): File "/home/server9/CityGaussian/merge.py", line 54, in blockMerge gaussians.load_ply(os.path.join(out_dir, f"cells/cell{idx}", "point_cloud_blocks", "scale_1.0", File "/home/server9/CityGaussian/scene/gaussian_model.py", line 229, in load_ply plydata = PlyData.read(path) File "/home/server9/miniconda3/envs/citygs/lib/python3.9/site-packages/plyfile.py", line 401, in read (must_close, stream) = _open_stream(stream, 'read') File "/home/server9/miniconda3/envs/citygs/lib/python3.9/site-packages/plyfile.py", line 481, in _open_stream return (True, open(stream, read_or_write[0] + 'b')) FileNotFoundError: [Errno 2] No such file or directory: 'output/mc_aerial_c36/cells/cell9/point_cloud_blocks/scale_1.0/iteration_30000/point_cloud.ply'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/server9/CityGaussian/merge.py", line 106, in blockMerge(lp, args.iteration) File "/home/server9/CityGaussian/merge.py", line 59, in blockMerge gaussians.load_ply(os.path.join(out_dir, f"cells/cell{idx}", "point_cloud_blocks", "scale_1.0", File "/home/server9/CityGaussian/scene/gaussian_model.py", line 229, in load_ply plydata = PlyData.read(path) File "/home/server9/miniconda3/envs/citygs/lib/python3.9/site-packages/plyfile.py", line 401, in read (must_close, stream) = _open_stream(stream, 'read') File "/home/server9/miniconda3/envs/citygs/lib/python3.9/site-packages/plyfile.py", line 481, in _open_stream return (True, open(stream, read_or_write[0] + 'b')) FileNotFoundError: [Errno 2] No such file or directory: 'output/mc_aerial_c36/cells/cell9/point_cloud_blocks/scale_1.0/iteration_1/point_cloud.ply'

`

2.I am not sure about the out_name parameter, I did not modify it. Then at the end of the calculation, the terminal reported an error: `GPU 0 is available. Traceback (most recent call last): File "/home/server9/CityGaussian/render_large.py", line 139, in render_sets(lp, args.iteration, pp, args.load_vq, args.skip_train, args.skip_test, args.custom_test) File "/home/server9/CityGaussian/render_large.py", line 93, in render_sets scene = LargeScene(dataset, gaussians, load_iteration=iteration, load_vq=load_vq, shuffle=False) File "/home/server9/CityGaussian/scene/init.py", line 114, in init self.loaded_iter = searchForMaxIteration(os.path.join(self.model_path, "point_cloud")) File "/home/server9/CityGaussian/utils/system_utils.py", line 27, in searchForMaxIteration savediters = [int(fname.split("")[-1]) for fname in os.listdir(folder)] FileNotFoundError: [Errno 2] No such file or directory: 'output/mc_aerial_c36/point_cloud' GPU 0 is available.

Scene: output/mc_aerial_c36 Traceback (most recent call last): File "/home/server9/CityGaussian/metrics_large.py", line 118, in evaluate(args.model_paths, args.test_sets, args.correct_color) File "/home/server9/CityGaussian/metrics_large.py", line 55, in evaluate for method in os.listdir(test_dir): FileNotFoundError: [Errno 2] No such file or directory: 'output/mc_aerial_c36/val_4' `

DekuLiuTesla commented 2 months ago

@WangYu0611 Thanks for your feedback.

The change of block number should also be integrated in your config. merge.py executes according to setting of appointed config. I guess the block division is still 6x6x1, thus it struggles to find the block 9.
Please refer to https://github.com/DekuLiuTesla/CityGaussian/issues/15#issuecomment-2297928704 for out_name setting. I think it should be test in your case.

WangYu0611 commented 2 months ago

Yes, for question 1, I checked the configuration file and it shows block_dim: [6, 6, 1], but I don’t quite understand what this means. What should I set for the SMALL_CITY aerial dataset?

WangYu0611 commented 2 months ago

是的，对于问题1，我检查了配置文件，它显示block_dim：[6, 6, 1]，但我不太明白这是什么意思。我应该为SMALL_CITY航空数据集设置什么？

Here I see in the configuration file, 6X6X1, thank you

ArSpi commented 2 months ago

Yes, for question 1, I checked the configuration file and it shows block_dim: [6, 6, 1], but I don’t quite understand what this means. What should I set for the SMALL_CITY aerial dataset?

You should set the max_block_id to 35, which means 6 6 1 - 1.

WangYu0611 commented 2 months ago

是的，对于问题1，我检查了配置文件，它显示block_dim：[6, 6, 1]，但我不太明白这是什么意思。我应该为SMALL_CITY航空数据集设置什么？

您应该将其设置max_block_id为 35，这意味着 6 6 1 - 1。

yes, i do it , tq

DekuLiuTesla commented 2 months ago

Yes, for question 1, I checked the configuration file and it shows block_dim: [6, 6, 1], but I don’t quite understand what this means. What should I set for the SMALL_CITY aerial dataset?

We have uploaded the custom dataset instructions. You can refer to it for more details. We have also removed confusing resolution postfix for easier usage.

WangYu0611 commented 2 months ago

是的，对于问题1，我检查了配置文件，它显示block_dim：[6, 6, 1]，但我不太明白这是什么意思。我应该为SMALL_CITY航空数据集设置什么？

我们已上传自定义数据集说明。您可以参考它了解更多详细信息。我们还删除了令人困惑的解析后缀，以便于使用。

thank u very much!!

DekuLiuTesla / CityGaussian

Do I need to use 3dgs to calculate before using this command? #5