BurntSushi / fst

Represent large sets and maps compactly with finite state transducers.
The Unlicense
1.78k stars 126 forks source link

[Help] Possible to build a FST at compile time? #145

Open bee-san opened 2 years ago

bee-san commented 2 years ago

I have a problem which I think FST might solve:

  1. 40 million keys, no values
  2. All strings
  3. We know every string at compile time and can guarantee no new strings will be added.
  4. Around 40-50k reads per programs run.
  5. We want to do this as fast as possible on runtime, and can sacrifice memory for it.

Is it possible to build an FST at compile time to achieve (5)?

BurntSushi commented 2 years ago

Sure. Serialize it to a file and use something like once_cell to load it via include_bytes.

It won't help do anything faster at runtime though.

You might want a perfect hashmap instead. Not sure if 40 million is too big for that though.

jymchng commented 1 year ago

Sure. Serialize it to a file and use something like once_cell to load it via include_bytes.

It won't help do anything faster at runtime though.

You might want a perfect hashmap instead. Not sure if 40 million is too big for that though.

Thank you @BurntSushi for your response to this question and your work with FST.

I've been thinking about this as well. Why would you say it wouldn't help? Baking the bytes into the binary would at least help during runtime in avoiding loading the map at runtime?

BurntSushi commented 1 year ago

I'm just saying that building an fstv at compile time won't make queries faster. It's the same bytes. This is unlike a perfect hash map for example.

Obviously building an fst at compile time means you won't have to do it at runtime, and that may or may not be significant.

jymchng commented 1 year ago

@BurntSushi Thank you for your clear response! 🚀 Learnt alot from it.