castorini / ura-projects

0 stars 1 forks source link

Replace SLIM load-into-memory vectors with something like DuckDB/Arrow? #14

Open lintool opened 9 months ago

lintool commented 9 months ago

Building on #3

In the SLIM model: https://dl.acm.org/doi/abs/10.1145/3539618.3591977

There's this stage where the model loads all the vectors into memory for manipulation. The vectors are stored in numpy arrays. Loading everything into memory leads to a lot of memory consumption...

What if we can replace with something like Arrow, read through DuckDB? Then we get mmaping for free?

Panizghi commented 5 months ago

Update I will start working on this task :)