1-800-BAD-CODE / punctuators

Package for inference for punctuation, true-casing, and sentence boundary detection
23 stars 2 forks source link

Could you please release the SBD model in pytorch format? #2

Open redthing1 opened 1 year ago

redthing1 commented 1 year ago

Thank you very much for your work.

You uploaded an ONNX SBD model: https://huggingface.co/1-800-BAD-CODE/sentence_boundary_detection_multilang

I was wondering if you could share the pytorch model (before ONNX-ification)

1-800-BAD-CODE commented 1 year ago

Sure. It won't run on the main branch of NeMo, though. You'll need this branch of my personal fork: https://github.com/1-800-BAD-CODE/NeMo/tree/sbd. It's a little messy.

I assume you mean .nemo model, not jit. If you mean the jit pytorch model let me know.

Raw file is at https://huggingface.co/1-800-BAD-CODE/sentence_boundary_detection_multilang/tree/main

>>> from nemo.collections.nlp.models.token_classification.sentence_boundary_model import SentenceBoundaryDetectionModel
>>> m = SentenceBoundaryDetectionModel.from_pretrained("1-800-BAD-CODE/sentence_boundary_detection_multilang", map_location="cpu")
>>> m.infer(["it was 8 p.m. and he was late for dinner. he had to work late"])
[['it was 8 p.m. and he was late for dinner.', 'he had to work late']]
redthing1 commented 1 year ago

Thanks for getting back to me! What's the jit model in this context? And could you please upload that too?

My actual usecase is that I want to do inference with https://github.com/skeskinen/bert.cpp Which has conversion scripts for the HF transformers pytorch models.