bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.18k stars 165 forks source link

add direct access to punctuation regex #120

Closed jncasey closed 2 years ago

jncasey commented 2 years ago

Fixes #99

First pass at adding this feature. Not extensively tested yet.

I took the approach @mmmaat suggested, which made the changes pretty minimal. But I'd be happy to rewrite it the way @hadware proposed if it's decided that's better.

mmmaat commented 2 years ago

And we also have a major issue with python-3.6 and re.Pattern : https://github.com/bootphon/phonemizer/runs/6084616538?check_suite_focus=true#step:6:54. I didn't get into details of 3.7 breaking changes but I assume it can be fixed easily.

hadware commented 2 years ago

Sorry for responding too late: I think my approach and @mmmaat 's aren't incompatible (add the right CLI arguments and change Punctuation's signature a bit). What do you think about that @mmmaat ?

Regarding the python3.6 incompatibility, I'll think a bit about it to find some sort of solution.

hadware commented 2 years ago

Wait, I'm wrong: the marks setter would make my suggestion a bit nonsensical. I'm still midly frustrated with this API though... suggestion: couldn't we have two classes , Punctuation and PunctuationRegex to make things crystal clear?

hadware commented 2 years ago

Regarding type checking for re.Pattern, a simple solution could be (schematically):

if TYPE_CHECKING and python_version > 3.6:
    from re import Pattern

...

def myfunction(mark: 'Pattern'):
   ...
jncasey commented 2 years ago

Regarding the 3.6 problem, it looks like we can use typing.Pattern (which exists in 3.6) instead of re.Pattern. I'll make that change.

For the high-level API discussion, I defer to the two of you.

codecov[bot] commented 2 years ago

Codecov Report

Merging #120 (6e5a30e) into master (87e5d65) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #120   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           23        23           
  Lines         1168      1184   +16     
=========================================
+ Hits          1168      1184   +16     
Impacted Files Coverage Δ
phonemizer/backend/base.py 100.00% <100.00%> (ø)
phonemizer/backend/espeak/base.py 100.00% <100.00%> (ø)
phonemizer/backend/espeak/espeak.py 100.00% <100.00%> (ø)
phonemizer/backend/festival/festival.py 100.00% <100.00%> (ø)
phonemizer/backend/segments.py 100.00% <100.00%> (ø)
phonemizer/main.py 100.00% <100.00%> (ø)
phonemizer/phonemize.py 100.00% <100.00%> (ø)
phonemizer/punctuation.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 87e5d65...6e5a30e. Read the comment docs.

hadware commented 2 years ago

Regarding the 3.6 problem, it looks like we can use typing.Pattern (which exists in 3.6) instead of re.Pattern. I'll make that change.

Yeah, that's a much better solution than mine. Good call!

Regarding the API, I'm myself also defering to @mmmaat 's opinion on this.

mmmaat commented 2 years ago

Well… The two classes Punctuation and PunctuationRegex are a good idea to make the API crystal clear, as @hadware said. On the other hand, this implies more changes in the code, to instantiate the one or the other in phonemize for example.

So I vote in favor of a single class for now, keeping in mind that it can be an improvement for the future.

About the marks setter, I wrote it a long time ago, was a mistake as it is useless... But for now we have to keep it to ensure retro-compatibility... Maybe someone somewhere is using it, I'm not sure :) In our wishlist for the 4.0 release.

@jncasey little stuff to do before your PR can be merged:

jncasey commented 2 years ago

Great! If we're moving forward with the single class version, I can definitely finish off the last bits in the next day or so.

jncasey commented 2 years ago

For CHANGELOG.md, I'll also add the changes from #119. Should this move the version to 3.2.1 or 3.1.2?

mmmaat commented 2 years ago

This will be 3.2.0 as we add new functionalities : https://semver.org/

But for your update in CHANGELOG you can add a section Not yet released, this is what I do usually. Then, on the time of release, we will update to the correct version number.

jncasey commented 2 years ago

This is pretty much ready to go, I think, except I'm not sure what's causing the error in the Windows test. I'll have to get back to this later to investigate.

jncasey commented 2 years ago

Okay I think this is (finally) ready.

mmmaat commented 2 years ago

Wonderful, thank you! I'm merging it.