Why `making_data.ipynb` mix up the train, validation and test splits of FiQA-SA?

KYLN24 commented 11 months ago

While for the other 3 datasets, only the train split is used.

btw, I notice that in https://github.com/AI4Finance-Foundation/FinNLP/tree/main/finnlp/benchmarks, test_fiqa.py uses all train, validation and test splits while the others use only the test split. I wonder the reason and sincerely request your help. Thanks!

oliverwang15 commented 11 months ago

Hi, KYLN24. For the FPB and FiQA datasets, we have tried our best to do the same split as the BloombergGPT so that the comparison with it can be much more meaningful. In BloombergGPT's paper: Like with FPB, we create our own random split combining both microblogs and news. After discretization, our training set contains 938 sentences with 576 positive, 287 negative, and 75 neutral sentences and our test set contains 235 sentences with 141 positive, 76 negative, and 18 neutral sentences. We select 5 shots and report weighted F1. However, we don't know the exact split of BloombergGPT, so we do it in this way:

# Train:
dataset = datasets.concatenate_datasets([dataset["train"], dataset["validation"] ,dataset["test"] ])
dataset = dataset.train_test_split(0.226, seed = 42)['train']

# Benchmark(Test):
dataset = datasets.concatenate_datasets([dataset["train"], dataset["validation"] ,dataset["test"] ])
dataset = dataset.train_test_split(0.226, seed = 42)['test']

KYLN24 commented 10 months ago

thanks for your answering！

AI4Finance-Foundation / FinGPT

Why `making_data.ipynb` mix up the train, validation and test splits of FiQA-SA? #93