Open FurkanGozukara opened 1 year ago
Thanks for letting me know.
I've pinned the minimum torch
dependency to 1.9 (released in 2021) which is when inference_mode
was introduced.
Let me know if this fixes it.
Thanks for letting me know.
I've pinned the minimum
torch
dependency to 1.9 (released in 2021) which is wheninference_mode
was introduced.Let me know if this fixes it.
Thanks I will test with torch 1.13 should it work?
Yes, I test with 1.13.1
Yes, I test with 1.13.1
hi it worked but looks like i am not using the correct strategy
I gave input like this
the output is not correct
this is the code
from punctuators.models import PunctCapSegModelONNX
m = PunctCapSegModelONNX.from_pretrained("pcs_en")
input_texts = example
results = m.infer(input_texts)
with open('punctuation_fullstop_truecase_english.txt', 'w') as f:
f.writelines(result)
by the way it doesn't work with torch 2. works with torch 1.13
The input should be a list of strings (a batch of inputs) rather than a single string:
>>> help(m.infer)
Help on method infer in module punctuators.models.punc_cap_seg_model:
infer(texts: List[str], apply_sbd: bool = True, batch_size_tokens: int = 4096, overlap: int = 16, num_workers: int = 0) -> Union[List[str], List[List[str]]] method of punctuators.models.punc_cap_seg_model.PunctCapSegModelONNX instance
So try instead
>>> input_texts = [example]
and you'll probably only want results[0]
which with sentence boundary detection enabled is a list of segmented sentences from example
.
Also, the model is trained on lower-cased, un-punctuated text (as the output of most ASR systems, the intended use-case). So you see a lot of <unk>
due to the capitalized inputs, and the model will get confused when it sees punctuation.
>>> example = "greetings everyone python has become the main programming language for open source machine learning and ai algorithms"
>>> results = m.infer([example])
>>> processed_example = results[0]
>>> processed_example
['Greetings everyone.', 'Python has become the main programming language for open source Machine learning and AI algorithms.']
>>> print("\n".join(processed_example))
Greetings everyone.
Python has become the main programming language for open source Machine learning and AI algorithms.
I'll pin the upper-bound on torch
as well, I haven't upgraded to 2.0 yet and they made quite a bit of changes.
The input should be a list of strings (a batch of inputs) rather than a single string:
>>> help(m.infer) Help on method infer in module punctuators.models.punc_cap_seg_model: infer(texts: List[str], apply_sbd: bool = True, batch_size_tokens: int = 4096, overlap: int = 16, num_workers: int = 0) -> Union[List[str], List[List[str]]] method of punctuators.models.punc_cap_seg_model.PunctCapSegModelONNX instance
So try instead
>>> input_texts = [example]
and you'll probably only want
results[0]
which with sentence boundary detection enabled is a list of segmented sentences fromexample
.Also, the model is trained on lower-cased, un-punctuated text (as the output of most ASR systems, the intended use-case). So you see a lot of
<unk>
due to the capitalized inputs, and the model will get confused when it sees punctuation.>>> example = "greetings everyone python has become the main programming language for open source machine learning and ai algorithms" >>> results = m.infer([example]) >>> processed_example = results[0] >>> processed_example ['Greetings everyone.', 'Python has become the main programming language for open source Machine learning and AI algorithms.'] >>> print("\n".join(processed_example)) Greetings everyone. Python has become the main programming language for open source Machine learning and AI algorithms.
I'll pin the upper-bound on
torch
as well, I haven't upgraded to 2.0 yet and they made quite a bit of changes.
thanks this worked. it looks like missing a lot of sentence ending .
here input and output attached
I removed all punctuation and made all lower case before giving as input
Python 3.9
error
I want to compare yours success with felflare-bert-restore-punctuation