Closed JanSellner closed 4 years ago
Interesting. The approach seems a bit more verbose but perhaps it helps maintainability. I imagine the new approach is also convenient to handle floats ('thirty-nine point seven' or 'thirty-nine and a half') and negatives ('minus nine' or 'negative nine')?
Could you please also add the new padding tests that I added in my PR #27?
This is an amazing re-write! Thank you for taking time and doing this. I am still reviewing this but maintainability and extensibility will improve with this approach
Could you please also add the new padding tests that I added in my PR #27?
I added the test for the time formats (which also revealed a bug in the code which is now also fixed). I did not add the keep_zero_padding
option, though, because I think that this library should not alter existing numbers in the text if they are not combined with number words. If someone wants to remove zero paddings, this is probably an additional processing step.
Interesting. The approach seems a bit more verbose but perhaps it helps maintainability. I imagine the new approach is also convenient to handle floats ('thirty-nine point seven' or 'thirty-nine and a half') and negatives ('minus nine' or 'negative nine')?
Yes, this is exactly the kind of new feature which should be easy to implement: we only have to think about new token types (e.g. sign) and then either adjust the existing rules or add a new one to handle these cases. I can take a look into it :-)
@JanSellner Is there any work being done on this or can I merge this in?
From my side it is done regarding this PR. For future extensions (e.g. decimal numbers), I would make a new PR.
@ShailChoksi no rush, but when do you expect the new release with these fixes to come out?
@fersarr just now! done :)
Wow great! thank you @ShailChoksi !
This is a major rewrite of the library with a switch to a "lex and parse" approach. The idea is to simplify the implementation by introducing a multi-step process (inspired by how compiler work):
forty-two
where we need to calculate something and a second concatenation rule which handles cases liketwenty twenty
, i.e. years where we only need to concatenate numbers.I also took the chance and split the implementation into several files.
This is, however, not just a rewrite but also introduces new functionality/fixes some bugs. For a start, the issues #25 and #26 are fixed automatically, i.e. without further adjustments. I also added several new test cases, some of them did not work before (see attached test log). A general new feature is the support of literals in the input string to handle cases like
2.5 thousand -> 2500
.The goal was to ease the implementation process by reducing complexity making it easier to fix bugs or add new features. I hope you like it :-)
Test Results - pytest_in_tests.zip