FasterDecoding / REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024
Apache License 2.0
166 stars 10 forks source link

Segmentation Fault when calling libsais_int #14

Open julianmukaj opened 5 months ago

julianmukaj commented 5 months ago
Suffix array initialized with length: 3055148
Calling libsais_int with parameters:
  buffer.as_ptr(): 0x75fee1dff010
  suffix_array.as_mut_ptr(): 0x6967ee0
  buffer.len() as i32: 3055148
  vocab_size: 100000
  symbol_frequency_table: 0
Segmentation fault (core dumped)

Justing the datastore creation scripts and they seem to crash on the finalize step of lib.rs, this is on ubuntu 22 with py 3.9.. Same thing on Windows.

Opened the git issue prematurely, I fixed this by adding +1 to vocabulary size in lib.rs (https://discourse.julialang.org/t/segfault-calling-c-function-any-advice/94730/8) and rebuilding the wheel, maybe it is model dependent issue not sure, something to track down and handle for future releases maybe?

I am having trouble with the data reader/search part too..

let end_of_indices = end_of_indices.unwrap();

is not caught is end_of_indices is None

Edit again: increasing vocabulary size above the tokenizer vocab size seems to solve the segmentation error, seems dependent on the datastore data if it throws or not.

if end_of_indices.is_none() {
                    return
                }

couldn't figure out why end_of_indices was null sometimes so just returned if so