Reading: Characterizing and Measuring Linguistic Dataset Drift

0. Paper

They propose a metric to evaluate the vocabulary/structural/semantic drift between train and test data.

They separate the data drift into vocabulary, structure, and semantics. Their metirc can evaluate the example-base drift.

vocabulary drift: log-perplexity of a unigram language model
structural drift: cross-entropy of a POS 5-gram model (using spaCy to annotate POS tags)
semantic drift: average of semantic change scores for all words w in the target example (one sentence from test set) where LSC(w) is the average pairwise cosine distance between the example and the train set To calculate contextual word vectors, they used a pre-trained RoBERTa model.

task: predict the performance of fine-tuned RoBERTa models on (in-domain / out-of-domain) classification tasks
prediction: logistic regression
features: frequency-based (previous methods), proposal (vocabulary / structural / semantic drift)
evaluation: ROC AUC, RMSE of prediction probability between logistic regression and RoBERTa

スクリーンショット 2023-06-02 16 21 54

Table 1 shows that their combination metrics (vocabulary, structural, and semantic drift) achieve the best performance.