gfredericks / test.chuck

A utility library for test.check
Eclipse Public License 1.0
214 stars 26 forks source link

string-from-regex for ClojureScript #46

Open wilkerlucio opened 8 years ago

wilkerlucio commented 8 years ago

Any plans on a CLJS implementation for the string-from-regex function?

gfredericks commented 8 years ago

I don't have short-term plans for this -- it's not easy to just port the clj code because the underlying regex impl that I targeted so specifically on the jvm is likely different in a variety of ways.

That said, knowing that people are interested in it definitely makes it more likely to happen. I'd also be happy to give advice to anybody else interested in trying.

lvh commented 7 years ago

I'm interested in trying; particularly because I'd like to land something using it in schpec :)

Is the main problem regex syntax differences between JS and the JVM? I'm looking at the impl and I'm definitely seeing some JVM-specific stuff there now, but it's not clear to me that it'll be hard to port. Apparently instaparse now also has a reasonable cljs port :)

gfredericks commented 7 years ago

definitely regex syntax differences, possibly differences in the definition of a character? I'd guess that JS uses unicode and therefore has the same character set but you never know with these things.

Fortunately test.check itself is a fun way of discovering where your parser/interpreter differs from reality, since you have the authoritative implementation readily available.

Speaking of which it's logically possible that regexes vary across js runtimes in which case fml.

agzam commented 7 years ago

Yeah. I as well really, really want regexes.cljc. It would be so awesome to to have string-from-regex in Clojurescript.

ikitommi commented 4 years ago

Also so much would like to see this one. Any progress by anyone?

ikitommi commented 4 years ago

Related: http://fent.github.io/randexp.js/

gfredericks commented 4 years ago

there's a branch/PR that I believe is incomplete

my main opinion at this point is that any implementation should at least be correct, even if incomplete.

i.e., it should attempt to parse the JS regex, optionally throwing an unsupported-error if the regex uses obscure features that nobody wants to implement in the parser (the jvm version doesn't do this -- afaik it can parse anything correctly), and then attempt to create a generator from the parsed regex, optionally throwing an unsupported-error if the regex uses features that are hard to implement, but if it returns a generator then that generator should only generate matching strings and its distribution should reasonably cover all possible matching strings.

The headachey part of the last requirement is likely unicode. I know java uses UTF-16 and surrogate pairs, and I have no idea if that's true for JS or not.