TsinghuaC3I / SoRA

The source code of the EMNLP 2023 main conference paper: Sparse Low-rank Adaptation of Pre-trained Language Models.
69 stars 9 forks source link

Question about the evaluation from test set in CoLA. #7

Open SEONHOK opened 6 months ago

SEONHOK commented 6 months ago

Hi! I have a question about the evaluation from CoLA using a test data set. The test data of Cola does not have labels. Then, how to evaluate the trained model using CoLa data set?

Thank you!

sborse3 commented 6 months ago

Hi @SEONHOK , infact I don't see any test labels on huggingface. Were you able to figure this out?

telxt commented 6 months ago

@SEONHOK @sborse3 Thank you for your interest in our work!

The results in the paper are from the test set. But this test set differs from the test part of the original dataset from Huggingface. We partition the dataset as follows:

For small datasets (n_samples < 10K), we divide validation set to half, use one half as test set and one half as validation set. For larger datasets (n_samples > 10K), we divide training set into 1K as validation and the rest as training set, keeping the original validation set as the test set. You can find the specific implementation in the get function within the SoRA/src/processor.py file (Lines 87-106).