yhyang-myron commented 1 year ago

Excuse me, how can I use the given checkpoint to test and visualize the result. I set the parameters of config like that and source scripts/train_models.sh:

!/bin/bash

export PYTHONUNBUFFERED="True"

export DATASET=Scannet200Voxelization2cmDataset

export MODEL=Res16UNet34D #Res16UNet34C, Res16UNet34D export BATCH_SIZE=8 export SUFFIX=try

export DATA_ROOT="/mnt//scannet_200" export PRETRAINED_WEIGHTS="/mnt/34D_CLIP_finetune.ckpt" export OUTPUT_DIR_ROOT="/mnt/LanguageGroundedSemseg/train_models_output"

export TIME=$(date +"%Y-%m-%d_%H-%M-%S") export LOG_DIR=$OUTPUT_DIR_ROOT/$DATASET/$MODEL-$SUFFIX

mkdir -p $LOG_DIR

LOG="$LOG_DIR/$TIME.txt"

python -m main \ --log_dir $LOG_DIR \ --dataset $DATASET \ --model $MODEL \ --batch_size $BATCH_SIZE \ --val_batch_size $BATCH_SIZE \ --scannet_path $DATA_ROOT \ --stat_freq 100 \ --visualize True \ --visualize_path $LOG_DIR/visualize \ --num_gpu 1 \ --balanced_category_sampling True \ --resume "/mnt/ckpt/34D_CLIP_finetune.ckpt"\ --is_train "False" 2>&1 | tee -a "$LOG"

But there's something with the result. I think I used the checkpoint in a wrong way. Thank you so much !

RozDavid commented 1 year ago

Hey @believexx,

The lunch script looks correct to me. Please always describe your problem as thorough as possible to make it easier to debug the errror. Here are a couple of reference questions you should go over to find the problem:

Do you have all components properly installed?
Can you start a training and do you see losses decreasing as expected?
Is your evaluation starting?
Are the weights properly loaded in your LighningModule? (you should be informed in the command line)
Is it failing or predicting trash results?
Any other important bits of information that you think might have happened...?

Kind regards, David

yhyang-myron commented 1 year ago

Thank you so much for your reply!

I installed all components in the readme here. And copy the train.txt and val.txt in the links: https://github.com/ScanNet/ScanNet/blob/master/Tasks/Benchmark/scannetv2_train.txt https://github.com/ScanNet/ScanNet/blob/master/Tasks/Benchmark/scannetv2_val.txt
I can start a training and also see losses decreasing as expected.
The evaluation is starting but get the wrong results. Also there's something wrong when testing on original pointcloud space, because there's no enough files.
I organized the test output and some instructions into the pdf. test output.pdf

RozDavid commented 1 year ago

Hey,

So I was a bit confused about the script where one run ends and the other starts, but I interpreted as you just kept in the whole training log and kept the end of testing only.

Tip 1) I would suggest you to first evaluate only on the voxels (without _test_originalpointcloud flag) and check the results there. The full cloud evaluation expects the evaluation to be finished at least once to have all .npy predictions as input for the point to voxel matching.

Tip 2) You can see when printing the hyperparams, that you have a relative path of the ckpt included, but later the module prints Resuming: None that means your local and global path are off. Try inputting the full path for the checkpoint or doublecheck if you are running your train script from the correct folder.

Let me know when you managed to load them correctly.

Regards, David

yhyang-myron commented 1 year ago

I found the problem I made. I gave the ckpt path directly to the resume. It worked well after I changed the path of the folder which stored the ckpt file. Also I changed the batch size to 1 and got all .npy predictions for full cloud evaluation! Thank you very much!!!

RozDavid / LanguageGroundedSemseg

Test the result #4

!/bin/bash