Closed KnutJaegersberg closed 4 years ago
Pretty cool idea. Have you seen inferregex? https://github.com/daranzolin/inferregex
not yet, thanks for the hint! kind of sense that the inference engine of strans, a wrapper around microsoft prose framework, might deal with more complex cases (framework compared to 73 lines of code function). didnt see yet any other tool, i.e. in py upon which aboves inference may be based? that exposes prose functionality. properly difficult to translate strans descriptions into r regexes, did not come across documentation of aboves regex description. https://microsoft.github.io/prose/documentation/transformation-text/intro/
https://github.com/Inventitech/strans
this is a handy command line tool i stumbled upon, whilst browsing appimages. you can give it a few examples, and then it uses a technique from microsoft prose framework to automaticaly infer regex rules. example below, for extracting file formats, but it can do a lot more than that.
wouldnt it be cool if there was an rpackage using this to autogenerate human readable r regex code?
ls | strans -before Viper_Browser-50-x86_64.AppImage -after AppImage --describe
let columnName = "0" in let x = ChooseInput(vs, columnName) in SubStr(x, PosPair(RegexPositionRelative(x, RegexPair("Dot", "ε"), 1), RegexPositionRelative(x, RegexPair("ε", "Line Separator"), -1)))
py application https://docs.microsoft.com/en-us/python/api/overview/azure/prose/intro?view=prose-py-latest