Open serin-delaunay opened 7 years ago
Regex could be expanded to use an "a" instead of an "an" for any /use.*/g match.
The issue is slightly wider than the use
prefix; "usable", "usurping", "usual", "uninformed", "unintelligent", "uninspected", "uninteresting" are all (probably, I'm on my phone) unaccounted for in the present scheme.
The rule to check if first letter is vowel or not is wrong but cover most of cases as mentioned in the Stack Exchange question. Choosing between a
and an
depend on the pronunciation rather that the spelling.
a house
a unique
a US dollar
an FBI agent
Basically you have to implement your own language modifier if you want to have specific language features in your app.
You can't cover the complexity of the English language rules in just a few lines of code, the modifier in this repo is just a template.
I forked the project and I am working on extracting the modifiers stuff to let people build their own. See https://github.com/mycaule/epures/tree/master/modifiers
The methodology should be to write lots of unit tests to make sure the language rules you want --but sadly not every rules-- are covered by the code.
In JS language rules are implemented in the library natural
for example. See "stemmers"
https://github.com/NaturalNode/natural/tree/master/lib/natural/stemmers
In this grammar:
The expected output from
start
would be"a useful tool like Tracery"
, but instead we get"an useful tool like Tracery"
. I haven't tried Tracery 2, but looking at its altereda
function I think the problem would remain (since the third letter isn'ti
).The most reliable function I've found to do this job is in inflect.py, although its regex usage isn't especially readable.