HKUDS / UrbanGPT

[KDD'2024] "UrbanGPT: Spatio-Temporal Large Language Models"
https://urban-gpt.github.io
Apache License 2.0
282 stars 39 forks source link

Clarification Needed: Is the Bike Flow Prediction Example Truly Zero-Shot? #24

Open Kaleemullahqasim opened 1 month ago

Kaleemullahqasim commented 1 month ago

I came across the Example-1: Bike Flow Prediction (Zero-shot scenario) in your paper, and I have some concerns regarding the classification of this task as “zero-shot.”

As I understand it, a zero-shot scenario typically involves the model performing a task without being provided any specific prior examples or historical data that directly relate to the task. However, in the provided bike flow prediction example, the model is given 12 time steps of historical inflow and outflow data. This seems to provide the model with concrete examples to base its predictions on, which would generally classify the task as a few-shot or data-driven prediction rather than a zero-shot task.

Could you clarify why this is being labeled as a zero-shot scenario? If it is indeed zero-shot, could you explain the reasoning behind the classification, given that historical data is provided for the prediction task?

Thank you for your time and help in clearing this up!

image

Kaleemullahqasim commented 1 month ago

https://huggingface.co/datasets/bjdwh/ST_data_urbangpt/viewer?row=0

even the training dataset is not zero-shot, you did add few-shot examples, but for some reason you always specified they are zeroshots

LZH-YS1998 commented 1 month ago

Hello. We understand the aspects that may be causing confusion. The spatio-temporal prediction task involves forecasting future values based on historical data, whether it occurs in full-shot, few-shot, or zero-shot settings. In this context, we differentiate the zero-shot setting from others based on whether the model has been trained on the historical data from the test regions. If the model has not been trained with data from the test regions, it is considered a zero-shot setting. We have formalized this task in Session 2, titled "Spatio-Temporal Zero-Shot Learning."

Kaleemullahqasim commented 1 month ago

Thank you for clarifying the concept of “Spatio-Temporal Zero-Shot Learning” in your study. I appreciate your explanation of how the zero-shot setting is defined based on the absence of training on the specific test regions.

However, after reviewing the approach, I have a follow-up question. It appears that the model still relies on historical data from the test region for making predictions, which can be viewed as a form of few-shot learning. If the goal is to generalize to new regions without explicit training, could this not be achieved by carefully crafting few-shot prompts that provide the model with the necessary historical data and context during inference?

Why fine-tune or train the model?if prompt with a few shots can fix the whole problem? Which in this case you are still using at the testing stage or prediction stage?

LZH-YS1998 commented 1 week ago

Hello, we understand your concern. The spatial-temporal prediction task we're discussing is inherently defined as predicting future trends based on historical data. Without historical data, it becomes a fundamentally different task. You may want can also look into similar works on how to implement "zero-shot definitions" for time series prediction or spatial-temporal prediction. Large Language Models Are Zero-Shot Time Series Forecasters; Unified Training of Universal Time Series Transformers... I hope this can help you.

Regarding your point about few-shot learning, we haven't found that providing few examples can effectively solve this problem. Therefore, instruction fine-tuning is necessary.