Open vogievetsky opened 5 years ago
Hi,
I would like to work on this PR.
Awesome!
@tan31989 are you still interested in working on this?
@vogievetsky yes, I'm trying to figure out the required changes. Kind of stuck with figuring out the linked issue with this
@vogievetsky I have tried X number of ways, trying to copy the CSVParser kind of implementations. Pardon me if this is vague, but I see the following code is using: if (!matcher.matches()) {}
in here is used for matching entire text.
I feel that beats the purpose of Regex parser, where if the pattern does not match until the entire text is matched as a whole. I was of the opinion it would best fit the use cases, where we use: while (matcher.find()) {}
, thus providing us with the ability to write regex with more flexibilities.
With matcher.find()
it's easier to replicate a regex pattern find and group. Adding a regex to match an entire string as is always ends up with using a global filter like (.*)
. There are so many variants of regex that would be missed because of this.
I would like to work on this
This would be super useful for ingesting data that has some form of a header such as what is seen in https://github.com/apache/incubator-druid/issues/8555.