about <onnx_decode_sentence.py>

frankyoujian / Edge-Punct-Casing

Apache License 2.0

20 stars 4 forks source link

about <onnx_decode_sentence.py> #3

Open XhrLeokk opened 3 months ago

XhrLeokk commented 3 months ago

hi, @frankyoujian , I have successfully ran the sherpa-onnx version, but

I got a batch of sentences want to be decode, I may need to use file to be more efficient. when I try to run ,

firstly, I change "from model import Model" to "from model import Model_new as Model". to run the code.
secondly, as it goes to sp.load(args.bpe_model) got error RuntimeError: Internal: could not parse ModelProto from ./sherpa-onnx/download/sherpa-onnx-online-punct-en-2024-08-06/bpe.vocab

am I missing a "bpe file" that can be read by sp? or the code need to be updated? I'm not an expert of ONNX, Thanks for the help.

csukuangfj commented 3 months ago

What is the output

ls -lh ./sherpa-onnx/download/sherpa-onnx-online-punct-en-2024-08-06/bpe.vocab

XhrLeokk commented 3 months ago

What is the output

ls -lh ./sherpa-onnx/download/sherpa-onnx-online-punct-en-2024-08-06/bpe.vocab

-rw-r--r-- 1 501 staff 146K Aug 5 03:19 *.bpe.vocab

csukuangfj commented 3 months ago

Does the file bpe.vocab exist?

The output only shows *.bpe.vocab.

Please make sure you have downloaded the model files correctly.

XhrLeokk commented 3 months ago

Does the file bpe.vocab exist?

The output only shows *.bpe.vocab.

Please make sure you have downloaded the model files correctly.

yes, the file exist. I used exactly this file ran the decode demo in []https://k2-fsa.github.io/sherpa/onnx/punctuation/pretrained_models.html#sherpa-onnx-online-punct-en-2024-08-06-english-only and it gives me the Puncted and Cased lines. But I didn't find a way to efficiently decode a batch of sentences, that's why I turn back to use

the full output, I ignore the path of ".vocab" -rw-r--r-- 1 501 staff 146K Aug 5 03:19 ./sherpa-onnx/download/sherpa-onnx-online-punct-en-2024-08-06/bpe.vocab

csukuangfj commented 3 months ago

Could you use an absolute path in your code?

XhrLeokk commented 3 months ago

your

yes, I can, the root path is /my_name.

frankyoujian commented 3 months ago

@XhrLeokk If you want to decode a batch of sentences, please concatenate them to one line sentence. It preprocesses the one line sentence by splitting the sentence into sequences of 200 tokens. This behavior is similar to that in onnx_decode_sentence.py.

monkiravn commented 3 months ago

I have same issue