bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.18k stars 165 forks source link

fixes to --preserve-punctuation #119

Closed jncasey closed 2 years ago

jncasey commented 2 years ago

This reworks a few things related to preserving punctuation.

A number of the tests had to be updated due to that first bullet point.

codecov[bot] commented 2 years ago

Codecov Report

Merging #119 (d2e5726) into master (d9315b9) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #119   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           23        23           
  Lines         1152      1168   +16     
=========================================
+ Hits          1152      1168   +16     
Impacted Files Coverage Δ
phonemizer/backend/base.py 100.00% <100.00%> (ø)
phonemizer/backend/espeak/espeak.py 100.00% <100.00%> (ø)
phonemizer/punctuation.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d9315b9...d2e5726. Read the comment docs.

jncasey commented 2 years ago

@hadware I think this is good to go, but it definitely changes the output when preserving punctuation compared to previous versions of phonemizer. This way seems much more consistent to me – the output will have the same number of word separators whether punctuation is being preserved or not.

See the tests I modified for examples of the new output.

hadware commented 2 years ago

Alright, super cool.

I'll review the PR probably tomorrow, and hopefully we can merge this. This is fantastic, thanks again for your work!

jncasey commented 2 years ago

Great! Let me know if you need me to make any changes.

And thank you for this project. It's replaced my DIY solution of a transformers model trained on the CMUDict, and is so much faster and more accurate.

hadware commented 2 years ago

This is looking good! I'm merging this. I'd like to add a test with the large e-book from #108 , then i'll release a new version (Probably 3.2.1 since this contains some pretty major changes)

jncasey commented 2 years ago

Great! This also closed #106, and makes PR #112 redundant

hadware commented 2 years ago

Oh, btw. Since you seem to have a very good understanding of the whole lib, would you mind if I ask for your opinion on future PR's (from other github users)? I'm also making you "triager" for this repo.

jncasey commented 2 years ago

Happy to help as much as I can!