KWARC / llamapun

common language and mathematics processing algorithms, in Rust
https://kwarc.info/systems/llamapun/
GNU General Public License v3.0
25 stars 6 forks source link

v3 of the paragraph dataset #22

Closed dginev closed 5 years ago

dginev commented 5 years ago

Reduces the size of the arXiv paragraph dataset to about 1/3