ammar / regexp_parser

A regular expression parser library for Ruby
MIT License
143 stars 22 forks source link

Ruby 2.4.4 Not Supported #52

Closed bobziuchkovski closed 6 years ago

bobziuchkovski commented 6 years ago

Ruby 2.4.4 is out but is not yet supported by regexp_parser (Unknown syntax name 'ruby/2.4.4'. Forgot to add it to Regexp::Syntax::VERSIONS?)

This looks similar to https://github.com/ammar/regexp_parser/issues/48

Would it make sense to structure the version parsing to assume that new patch releases support the same feature set as the latest explicitly-defined patch release for the given major/minor version? E.g. ruby 2.4.4 supports all of the same features as ruby 2.4.3 unless explicitly overridden? It seems odd to need to explicitly define/whitelist new patch revisions. I can understand wanting to explicitly define new major and minor revisions, but a patch release by it's very nature should never be removing functionality. In perusing https://github.com/ammar/regexp_parser/tree/master/lib/regexp_parser/syntax/ruby, it seems that's largely the way the files are already structured, albeit explicitly.

jaynetics commented 6 years ago

Hi @bobziuchkovski

Thanks for the heads up.

I've just released v0.4.12 with support for Ruby 2.4.4 and 2.5.1.

I also fully agree with your view of how version upgrades should work. If we can choose between

surely b) would be the preferable course for most real world use cases.

I've started preparing an automatic fallback to the latest configured ruby version some time ago.

The only backwards-incompatible effect would be that Regexp::Syntax::VERSIONS (or Regexp::Syntax::Ruby.constants) would no longer be available to display a list of supported versions. But probably no one is using these. What do you think @ammar?

Edit: To clarify, a case can be made for whitelisting patch versions, because Ruby does not really follow semantic versioning, at least not when it comes to regex features added via new Onigmo versions.

Recent examples are new unicode scripts, or the absence operator (?~...) added in Ruby 2.4.1.

Such changes could in theory lead to regexp_parser silently mistaking newly introduced meta elements for simple escapes, literals, or something else.

bobziuchkovski commented 6 years ago

Thanks @janosch-x !

It seems like automatic fallback and explicitly-supported versions aren't necessarily mutually exclusive, though. The current mechanism could be extended to assume a version that isn't in the explicit Regexp::Syntax::VERSIONS list behaves the same as the highest explicitly-supported patch revision for the same major.minor line. Then fallback to that closest-ancestor rather than throwing an exception.

I would think that would avoid backwards-compatibility problems while sidestepping the issue of handling new ruby versions. If anyone relies on using explicitly-supported versions, they would have the option to fail hard in their downstream code, similar to the current upstream behavior.

I fully acknowledge that I could be misunderstanding or overlooking corner cases, though. I'm not familiar enough with the project internals, so forgive me if I'm being naive. :)

bobziuchkovski commented 6 years ago

@janosch-x Oh, I just caught the "Recent examples are new unicode scripts, or the absence operator (?~...) added in Ruby 2.4.1." part. That is a good point and definitely a case I didn't consider. :)

bobziuchkovski commented 6 years ago

Hmm...I imagine this is much more far-fetched/difficult, but is it possible to dynamically determine supported features by compiling and comparing parse results on the running ruby VM. E.g. attempting to compile and match (?~...) on example strings and determining from the match behavior (or raised exceptions) which features are supported by the ruby VM? In this sense, effectively a sort of autoconf style of determining what the current environment supports?

ammar commented 6 years ago

I agree and I like avoiding a hard failure when there's not exact match for RUBY_VERSION.

I think the fallback should only consider the tiny number of the version though, for example falling back to the latest defined 2.4 if a given 2.4.x is not. I would be surprised if I got 2.3 when I was running 2.4.

If I understand the PR correctly, I think removing the empty version syntax files is a good idea. I got into the habit of adding them at a time when ruby's regex engine was undergoing a lot of changes. It's been fairly stable since 2.1.

Thanks @janosch-x! Please let me know if I can help with anything.

jaynetics commented 6 years ago

@bobziuchkovski

Feature detection is a really interesting idea!

The thing is that regex features are not usually backported in Ruby, e.g. the 2.3 branch of Ruby never got the absence operator introduced in 2.4.1.

This probably makes it easier to just list new features once for a minor version and assume they are inherited from there on.

@ammar

I also thought about "limiting" the fallback to some version levels, but as regex feature additions don't adhere do semantic versioning, it didn't seem to offer much of a "protective" benefit to me.

If your point is more of an aesthetical one, the latest commit on the branch might help ;)

jaynetics commented 6 years ago

@bobziuchkovski regexp_parser v0.5.0 is now out with automatic support for future Ruby versions.

bobziuchkovski commented 6 years ago

That's terrific news. Thanks @janosch-x !