ZJaume / heliport

Fast and accurate language identifier
GNU General Public License v3.0
3 stars 0 forks source link

Benchmark #1

Open ZJaume opened 6 months ago

ZJaume commented 6 months ago

Running with 5000 random sentences from openlid

method time (s)
fasttext lid201 0.89
HeLI OTS 9.65
heli-otr 7.83
+ Lang Enum 3.98
+ Fnv hash model 3.22
+ Fnv hash identifier 2.84
+ Patricia tree 7.92

hashing functions comparison method time (s)
fnv 2.84
seahash 3.46
highway march=native 4.00
murmur2 3.40
murmur3 4.15
xxhash 3.66
ahash* 2.73
wyhash 2.75
wyhash2 2.71

* output not stable in different computers.

ZJaume commented 4 months ago
Model loading time: method time (s)
fasttext lid193 0.53s
heli OTS 7.2s
heli-otr bincode 4.3s
heli-otr rkyv 2.0s
heli-otr bitcode 0.92s
+ separated ngram files 0.67s
ZJaume commented 4 months ago
Now running with 100k sentences, since 5k seem to be too few. method time (s)
CLD2 1.12
HeLI-OTS 60.37
lingua all high preloaded 56.29
lingua all low preloaded 23.34
fasttext lid193 8.44
heli-otr wyhash + static scorers 5.28
+ bitcode 4.72
+ vec<lang, prob> 2.40
+ early char count 2.33
+ rayon 32thread 0.90
+ score_lang vectorized 2.09