How do we obtain metrics for evaluation?

KL4805 commented 3 months ago

Hello authors,

Thanks for your work, code, and checkpoints! I tried to run your urbangpt_eval.sh with vllm/ray/fastchat disabled (due to a mysterious error that I don't know why). The inference works fine now.

I tried to inference 50 samples (as I only have V100, it is slower, so I have to reduce the number of samples). I got the following.

Loading checkpoint shards: 100%|██████████████████████████████████████████| 3/3 [00:02<00:00,  1.35it/s]
finish loading
total: 51920
50it [16:10, 19.40s/it]

However, no metrics (e.g. MSE, MAE) are calculated. Is it the intended behavior? Or if not, how are we supposed to get the evaluation metrics?

LZH-YS1998 commented 3 months ago

Hello, the result file contains the prediction results. We will also release the evaluation code of UrbanGPT to the repository in the next two days. We hope this will be helpful to you.

KL4805 commented 3 months ago

It will be very helpful. Looking forward to it.

LZH-YS1998 commented 3 months ago

Code has been released, please refer to section 4.3.

HKUDS / UrbanGPT

How do we obtain metrics for evaluation? #14