daac-tools / python-vibrato

Viterbi-based accelerated tokenizer (Python wrapper)
Apache License 2.0
40 stars 1 forks source link

memory allocation of 6084209583480531286 bytes failed - crashes the Python interpreter #6

Closed EliCodesForFun closed 1 year ago

EliCodesForFun commented 1 year ago

I decompressed the zts file with this:

import zstandard as zstd import os

def decompress_zst_file(input_file, output_file): dctx = zstd.ZstdDecompressor()

with open(input_file, 'rb') as ifh, open(output_file, 'wb') as ofh:
    dctx.copy_stream(ifh, ofh)

print(f"Successfully decompressed '{input_file}' to '{output_file}'")

input_zst_file = r'H:\My Drive\Data Analyst Prep!Programming Files\Python Projects\VibratoProject\ipadic-mecab-2_7_0\system.dic.zst' output_file = r'H:\My Drive\Data Analyst Prep!Programming Files\Python Projects\VibratoProject\ipadic-mecab-2_7_0\system.dic' decompress_zst_file(input_zst_file, output_file)

And then I ran the example code and and the interpreter crashes. Using VSCode, Python 3.11.3. IDK :/

It crashes on the bolded line here: import vibrato with open(r'H:\My Drive\Data Analyst Prep!Programming Files\Python Projects\VibratoProject\ipadic-mecab-2_7_0\system.dic', 'rb') as fp: dict_data = fp.read()

_tokenizer = vibrato.Vibrato(dictdata)

tokens = tokenizer.tokenize('社長は火星猫だ')

vbkaisetsu commented 1 year ago

@EliCodesForFun Thank you for reporting.

We have updated Vibrato, which the wrapper internally uses. Could you check if the error no longer occurs after updating python-vibrato?

This error was probably caused by loading an incompatible dictionary. Previous versions of Vibrato did not check the compatibility of dictionaries before loading them, so when an incompatible dictionary was loaded, an error like the one you reported occurred. The latest version checks for compatibility before loading and outputs an appropriate error if necessary.