Closed GuntherRademacher closed 9 months ago
While these changes fix the corresponding tests, we are operating here close to the borders of what can be done with translating to Java regexes. There may be more cases where the match-empty restriction for backref'ed groups produces unexpected results. On the other hand, all of those are most likely corner cases of minor importance. If you'd prefer to solve this by a using a different (new?) regex implementation, please let me know.
@GuntherRademacher Thanks for your efforts. I think it’s perfectly fine to stick with the existing regex conversion… But I was still positively surprised that you managed to fix the remaining corner cases.
In https://github.com/BaseXdb/basex/issues/2240#issuecomment-1723188208, you indicated that p888-p891 could be invalid. Do you think we should report this back, or have you come to the conclusion that the tests conform to the specs?
@ChristianGruen You may be referring to the inital edit of my comment. When I wrote it, I was under the impression that the patterns were valid, which however missed this,
The - character is a valid character range only at the beginning or end of a [·positive character group·]
Given this, those patterns are invalid and should be rejected, so the tests are OK. I had then edited my comment half an hour later, before you commented on it.
Perfect. You are right, I had read both the initial and the most recent commit, and I must have mixed them up.
This PR contains three changes addressing three problems:
While these changes fix the corresponding tests, we are operating here close to the borders of what can be done with translating to Java regexes. There may be more cases where the match-empty restriction for backref'ed groups produces unexpected results. On the other hand, all of those are most likely corner cases of minor importance. If you'd prefer to solve this by a using a different (new?) regex implementation, please let me know.