aarnphm / whispercpp

Pybind11 bindings for Whisper.cpp
Apache License 2.0
322 stars 57 forks source link

feat: Add function to return token as bytes #53

Closed pajowu closed 1 year ago

pajowu commented 1 year ago

What does this PR address?

Whisper tokens can be invalid utf-8 (grapheme-cluster is split over 2 tokens). This functions allows the user of this library to re-construct the valid string themself.

Before submitting: