1-800-BAD-CODE / punctuators

Package for inference for punctuation, true-casing, and sentence boundary detection
23 stars 2 forks source link

module 'torch' has no attribute 'inference_mode' #1

Open FurkanGozukara opened 1 year ago

FurkanGozukara commented 1 year ago

Python 3.9

from typing import List

from punctuators.models import PunctCapSegModelONNX
m = PunctCapSegModelONNX.from_pretrained("pcs_en")
input_texts = example
results = m.infer(input_texts)

error

image

I want to compare yours success with felflare-bert-restore-punctuation

1-800-BAD-CODE commented 1 year ago

Thanks for letting me know.

I've pinned the minimum torch dependency to 1.9 (released in 2021) which is when inference_mode was introduced.

Let me know if this fixes it.

FurkanGozukara commented 1 year ago

Thanks for letting me know.

I've pinned the minimum torch dependency to 1.9 (released in 2021) which is when inference_mode was introduced.

Let me know if this fixes it.

Thanks I will test with torch 1.13 should it work?

1-800-BAD-CODE commented 1 year ago

Yes, I test with 1.13.1

FurkanGozukara commented 1 year ago

Yes, I test with 1.13.1

hi it worked but looks like i am not using the correct strategy

I gave input like this

image.png

the output is not correct

image.png

this is the code

from punctuators.models import PunctCapSegModelONNX
m = PunctCapSegModelONNX.from_pretrained("pcs_en")
input_texts = example
results = m.infer(input_texts)
with open('punctuation_fullstop_truecase_english.txt', 'w') as f:
    f.writelines(result)

by the way it doesn't work with torch 2. works with torch 1.13

1-800-BAD-CODE commented 1 year ago

The input should be a list of strings (a batch of inputs) rather than a single string:

>>> help(m.infer)
Help on method infer in module punctuators.models.punc_cap_seg_model:

infer(texts: List[str], apply_sbd: bool = True, batch_size_tokens: int = 4096, overlap: int = 16, num_workers: int = 0) -> Union[List[str], List[List[str]]] method of punctuators.models.punc_cap_seg_model.PunctCapSegModelONNX instance

So try instead

>>> input_texts = [example]

and you'll probably only want results[0] which with sentence boundary detection enabled is a list of segmented sentences from example.

Also, the model is trained on lower-cased, un-punctuated text (as the output of most ASR systems, the intended use-case). So you see a lot of <unk> due to the capitalized inputs, and the model will get confused when it sees punctuation.

>>> example = "greetings everyone python has become the main programming language for open source machine learning and ai algorithms"
>>> results = m.infer([example])
>>> processed_example = results[0]
>>> processed_example
['Greetings everyone.', 'Python has become the main programming language for open source Machine learning and AI algorithms.']
>>> print("\n".join(processed_example))
Greetings everyone.
Python has become the main programming language for open source Machine learning and AI algorithms.

I'll pin the upper-bound on torch as well, I haven't upgraded to 2.0 yet and they made quite a bit of changes.

FurkanGozukara commented 1 year ago

The input should be a list of strings (a batch of inputs) rather than a single string:

>>> help(m.infer)
Help on method infer in module punctuators.models.punc_cap_seg_model:

infer(texts: List[str], apply_sbd: bool = True, batch_size_tokens: int = 4096, overlap: int = 16, num_workers: int = 0) -> Union[List[str], List[List[str]]] method of punctuators.models.punc_cap_seg_model.PunctCapSegModelONNX instance

So try instead

>>> input_texts = [example]

and you'll probably only want results[0] which with sentence boundary detection enabled is a list of segmented sentences from example.

Also, the model is trained on lower-cased, un-punctuated text (as the output of most ASR systems, the intended use-case). So you see a lot of <unk> due to the capitalized inputs, and the model will get confused when it sees punctuation.

>>> example = "greetings everyone python has become the main programming language for open source machine learning and ai algorithms"
>>> results = m.infer([example])
>>> processed_example = results[0]
>>> processed_example
['Greetings everyone.', 'Python has become the main programming language for open source Machine learning and AI algorithms.']
>>> print("\n".join(processed_example))
Greetings everyone.
Python has become the main programming language for open source Machine learning and AI algorithms.

I'll pin the upper-bound on torch as well, I haven't upgraded to 2.0 yet and they made quite a bit of changes.

thanks this worked. it looks like missing a lot of sentence ending .

here input and output attached

I removed all punctuation and made all lower case before giving as input

output.txt

input.txt