Open alkuma opened 1 month ago
Since there was a similar issue reported (and closed via a code change / PR) on the tokenizer side I just forked both tokenizer
and fastemebed-go
and published the latest master
/ main
branch and used them as dependency, and the error is gone.
Perhaps all that's needed to be done is to publish the latest versions of both?
@alkuma, I'd recommend you keep your project dependent on your fork. It gives you the flexibility to add any changes. As I can see, both fastembed-go and https://github.com/sugarme/tokenizer aren't under active maintenance.
I am getting a nil pointer error with specific texts, I created a test at https://github.com/alkuma/tokenizerissue to demonstrate the issue.
There are two strings that are being embedded, the first one goes thru, but the second one fails.
Here is the output of the program:
The first chunk has 634 characters and the embedding is successful. The next chunk has 835 characters (ie the first 634 characters and an additional 201 characters beyond that) and it fails with the tokenizer nil pointer dereference error.
Has anybody faced this before, is it a known issue, and if so is there a way to work around it?
Please let me know if any additional information is required.
To execute the tests, follow these steps
git clone
the https://github.com/alkuma/tokenizerissue repositoryONNX_PATH
to the correct valueTestEmbedding
which is present in the fileembed_test.go
and you should get the error