Closed aaronsteers closed 2 years ago
You could use the "split regexp" option to add these words manually as split options. It's a bit clunky though. That said, both these examples work as-is today since that's within the definition of how we accept "words": https://runkit.com/blakeembrey/6176190cc0655900096c5104. Open to API improvements via a PR that doesn't add too much extra bytes to the module if you'd want to officially support something like this.
I'll look at this. Thanks, @blakeembrey!
Have you given any thought to special words lookups?
For instance, "AWSInstance" and "EC2Box" (to my knowledge) require a dictionary lookup on special words "AWS" and "EC2" in order to know that that aws_instance and ec2_box are both correct and all of the below are incorrect:
In a past project, I was able to resolve these via a hardcoded "special words" lookup, which would prioritize translations that favored keeping special words together.
Is this something you've considered or perhaps are there other solutions I'm not aware of?
Background: I'm interested in potentially using this library for data pipeline name conversations and wanted to inquire how to solve this problem which I've had some difficulty with before.
Thanks!