KWARC / llamapun

common language and mathematics processing algorithms, in Rust
https://kwarc.info/systems/llamapun/
GNU General Public License v3.0
25 stars 6 forks source link

Consider a native rust2vec dependency #25

Closed dginev closed 3 years ago

dginev commented 5 years ago

Until now I have been using a separate script external to llamapun to invoke the glove toolchain and generate word embeddings for follow-up experiments.

A Rust reimplementation of Glove (a project I considered embarking on, but never had the time to commit to) just had a new release and is looking promising:

https://github.com/finalfusion/finalfusion-rust

So it may be a curious comparison to rerun the arXMLiv embeddings generation with rust2vec and see if we arrive at similar embeddings, and/or results.

dginev commented 5 years ago

There is also fast-text: https://github.com/DominicBurkart/fast_text

dginev commented 3 years ago

Now that the field has moved towards subword tokenizations (BPE and WordPiece), this issue is less likely to get the time it deserves to get correctly implemented. If anyone is interested, PRs welcome, but I won't be jumping in here.