eclipse / tm4e

TextMate support in Eclipse IDE
https://projects.eclipse.org/projects/technology.tm4e
Eclipse Public License 2.0
87 stars 56 forks source link

joni onigurama library does not fully implement oniguruma syntax #677

Open sebthom opened 7 months ago

sebthom commented 7 months ago

TextMate uses the oniguruma syntax for regular expressions.

TM4E currently uses jruby/joni as onigurama implementation. Unfortunately joni does not implement the the full onigurama syntax, e.g. variable length look-behind. This means that not all TextMate language files can be parsed with TM4E and loading fails with invalid pattern in look-behind. Since the goal of the joni library is to emulate Ruby's behavior and Ruby does not support variable length look-behinds, joni won't support them anytime soon.

vscode uses the original oniguruma C library from https://github.com/kkos/oniguruma/

It might make sense to investigate if we can do the same, e.g. using kkos/oniguruma via https://github.com/jnr/jnr-ffi/

akurtakov commented 7 months ago

I would strongly recommend going for https://openjdk.org/jeps/442 (Java 21 requirement) as jnr-ffi is not fun with Mac OS signing and the whole Orbit fuzz (https://github.com/eclipse-orbit/orbit-simrel/blob/6926eeed8ffaf2174b0fd42bc3c4e94208bda216/maven-sign/tp/MavenSign.target#L6)

sebthom commented 7 months ago

Well, I have no experience with JNI/JNA/JNR/FFM so I hope someone else picks this up :-)

mickaelistria commented 7 months ago

FWIW, moving away from Joni to embedding upstream Oniguruma could be a good improvement for the ecosystem.

sebthom commented 7 months ago

@mickaelistria any idea when we can expect eclipse to be bundled with justj 21 by default? I don't think moving forward with this via FFM is a good idea as long as Eclipse still uses justj 17

akurtakov commented 7 months ago

@sebthom As per https://github.com/eclipse-simrel/.github/blob/main/wiki/SimRel/Simultaneous_Release_Requirements.md#execution-environment-partially-tested it should be fine for after 2024-03 release.

angelozerr commented 7 months ago

It should be amazing to consume original oniguruma C but as so many plugins uses tm4e (I m so happy to have spent so many time to create this plugin and so happy that you continue to improve it dragsticly, thanks for that!) we need to be carefull.

IMHO I think tm4e should try to parse the textmate with the original oniguruma C and if it fails it should fallback to use joni like today.

mickaelistria commented 7 months ago

but as so many plugins uses tm4e

Those consumers will have to contribute back is they want to keep some Joni support. But at the moment, it seems like keeping Joni is not in the best interest of anyone. If APIs are broken, this would be worse moving to a new major release.

angelozerr commented 7 months ago

Tm4e is working well with most textmates since several years.

User will not understand why their textmate will crash if oniguramm1 C using have a bug or if some OS doesnt support it correctly.

And users will havr bad feelings with tm4e

sebthom commented 7 months ago

Btw. IntelliJ has the same issue: https://youtrack.jetbrains.com/issue/IDEA-336274

The main issue atm are variable length look-behind. we could try to dynamically rewrite such unparseable regex to some alternative syntax as suggested here https://stackoverflow.com/a/24591663/5116073 - if that is possible.

Using the C lib directly could also improve overall parsing speed as discussed here https://github.com/jruby/joni/issues/43

mickaelistria commented 7 months ago

Tm4e is working well with most textmates since several years.

And it's likely to work even better if we can use upstream/more used/better maintained/... oniguruma instead of Joni.

User will not understand why their textmate will crash if oniguramm1 C using have a bug or if some OS doesnt support it correctly.

I don't think using upstream oniguruma is more error-prone than using Joni. Usually, the less indirections there are are, and the less room there is for bugs, the higher quality is.

And users will havr bad feelings with tm4e

Moving to upstream is more likely to increase the quality of the stack than to decrease it.