gagolews / stringi

Fast and portable character string processing in R (with the Unicode ICU)
https://stringi.gagolewski.com/
Other
304 stars 44 forks source link

Unclear error message with invalid regular expression #382

Closed billdenney closed 4 years ago

billdenney commented 4 years ago

While attempting to make the simple lexer/parser that I described in #380, I got an error that I couldn't understand. I think that the request of this issue is to improve the error message.

The message for the code below is indicating that the regular expression pattern "a{}" is invalid, but that wasn't obvious from the message. Can the error message please be clarified? I think that the preferred error message would indicate the following:

sprintf("Syntax error in regexp pattern: Error in {min,max} interval. (U_REGEX_BAD_INTERVAL)\nPattern with error is: %s", bad_pattern)

The original error wasn't immediately clear that the problem was a syntax error in a regexp pattern (U_REGEX is a good hint in that direction, but in my actual code, there was a lot going on). And, it would help if the specific pattern is noted to the user since the patterns can be vectorized.

I think that the initial part of the fix would be to add "Syntax error in regexp pattern: " to the beginning of all the U_REGEX error messages in https://github.com/gagolews/stringi/blob/0ff9ac7840ee3049d0e0d01e830c27f4b807694a/src/stri_exception.cpp#L280-L323

and

https://github.com/gagolews/stringi/blob/0ff9ac7840ee3049d0e0d01e830c27f4b807694a/src/stri_exception.cpp#L332-L336

I don't know how to add the pattern itself to the errors, though.

stringi::stri_replace_all_regex(str="a", pattern="a{}", replacement="b")
#> Error in stringi::stri_replace_all_regex(str = "a", pattern = "a{}", replacement = "b"): Error in {min,max} interval. (U_REGEX_BAD_INTERVAL)

Created on 2020-05-20 by the reprex package (v0.3.0)

gagolews commented 4 years ago

Apologies for a late response, Bill, this somehow landed in my spam folder. :/

I guess the correct place for this is in stri_container_regex.cpp - RegexMatcher* StriContainerRegexPattern::getMatcher(R_len_t i) method.

STRI__CHECKICUSTATUS_THROW(status, {if (lastMatcher) delete lastMatcher; lastMatcher = NULL;})

Should be substituted with something like (I'm improvising now):

if (U_FAILURE(status)) {                                 
     if (lastMatcher) delete lastMatcher; lastMatcher = NULL;
    const char* errmsg = getICUerrorName(status);                                            
    SEXP failingregex = this->toR(i);
      throw StriException("Error in regex `%s`: %s", (const char*)CHAR(failingregex), errmsg);                          
   }  

You wanna give it a try or should I do it later on?

billdenney commented 4 years ago

Thanks for the follow-up.

I'm all thumbs when it comes to C++ programming, so I'd appreciate if you could make the modification.

gagolews commented 4 years ago

Sure, no worries.

gagolews commented 4 years ago

Done. The error msg is now going to be like:

## Error in {min,max} interval. (U_REGEX_BAD_INTERVAL, context=`a{}`)
billdenney commented 4 years ago

Thanks!