IINemo / lm-polygraph

MIT License
111 stars 21 forks source link

Do not copy dataset when subsampling #172

Closed rvashurin closed 5 months ago

rvashurin commented 5 months ago

np.array(list).tolist() is very memory-inefficient and causes OOMs when subsampling huge datasets (i.e. trainset of wmt14).