Georgetown-IR-Lab / cedr

Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
MIT License
155 stars 28 forks source link

small validation set doesnt work #3

Closed cmacdonald closed 5 years ago

cmacdonald commented 5 years ago

data.iter_valid_records() doesn't yield anything if the validation set is smaller than batch_size.

Adding a final block as follows works:

if len(batch['query_id']) > 0:
  yield _pack_n_ship(batch)

This also means that the final validation % batch_size documents are omitted from validation.

I suspect data.iter_train_pairs() has exactly the same issue.

seanmacavaney commented 5 years ago

Good catch! The data.iter_train_pairs is not affected by this because _iter_train_pairs iterates indefinitely.