Open jdkato opened 3 years ago
@jdkato The link to spacy-vale is a 404. Did the repository get deleted? Curious about the work here since I have messed with spaCy myself before and am keen to use such features with some languages I deal with, e.g. Dutch.
It moved to https://github.com/errata-ai/nlpapi. Nothing is really concrete yet, though.
I'm completely new to vale, but it would be awesome if I could use it for Czech. I'd probably be a heavy user then. Being busy these days, I don't want to promise much, but I think I could find some time later on to try out the NLP-based rules. I'm also a Python dev, so external dependencies in Python do not scare me 😄 If I understand it correctly, this is very beta now and I'll have to write my own rules to test it out - there are no predefined rulesets I could use for Czech right now, is that correct?
Btw, the link to sequence
is broken too: https://docs.errata.ai/vale/styles#sequence-v230 Not sure where to find it now, couldn't see any docs linked to https://github.com/errata-ai/nlpapi
I see there are tags and patterns. Where can I learn about the tags? Not sure what MD
or JJ
means.
All of the required pieces are finally in place to offer integration with spaCy:
This allows Vale to support (1) rules written for any of spaCy's supported languages and (2) highly accurate (custom-trained, even) NLP.
If implemented well, I think this has the potential to easily 2x Vale's usefulness.
Getting Started
The current version of Vale (v2.10.4) has unofficial support (since the implementation details are still a WIP) for this integration.
To get started, you'll need Vale (v2.10.4), Python 3.9, and Pipenv installed. Next, follow the steps below:
Start the
spacy-vale
API locally.Create a
.vale.ini
file:Create a style/rules (see next section).
Creating, testing, and debugging rules
The main entry point for NLP-based rules will be the
sequence
extension point. For example, an implementation of LanguageTool'sWOULD_BE_JJ_VB
rule:To help the testing process, you can use Vale Studio's View Tags feature, which currently supports Markdown content written in Chinese, English, German, Russian, or Spanish.
Finally, you'll be able to use these NLP-based rules with all existing integrations—such as VS Code shown below.
Feedback
Please report any issues you encounter: linting speed, Vale Studio usability,
sequence
limitations, etc.