Open daobrien opened 6 months ago
Unfortunately, there's no straightforward solution here.
I'd argue that "buggy" is the wrong word here; the results are actually objectively good. For comparison, the NLTK (a very widely-used NLP library) gives the same exact results when using its default tagger.
And when you consider the other constraints Vale has (~20MB binary, offline, no NLP installation dependencies, etc.), the results are very good.
That said, the fact that I had to write my own NLP library to even get this far is obviously not ideal. I've tried a number of ideas to incorporate third-party libraries but it complicates the installation / setup process pretty significantly.
For example, two of the best available libraries:
Just aren't that practical for many of Vale's use cases.
I'm not sure what the solution here is yet, but it's definitely something that I've put a lot of time into trying to improve.
Thanks for your explanation of what's going on. Maybe s/buggy/imperfect/ and obviously enough getting perfect software is really hard. I can pass all this on to the team who help me with our Vale setup, but relying on local servers is probably not something they'll get excited about.
Feel free to update the status of this to whatever you deem appropriate. David
Check for existing issues
Environment
Fedora Linux 38 Installed from RPM vale version 3.0.7
Describe the bug / provide steps to reproduce it
Trying to write a rule to identify complex adjectives, which should be hypenated. E.g., in the phrase "the upper left corner", "upper-left" should be hyphenated.
The rule currently appears as follows:
We've used several test cases and cannot get consistent results:
Vale only catches the last test case.
We used Vale Studio to test the parts of speech, but the results are inconsistent:
This blocks further development of this rule for us. Would really appreciate any help. Thanks.