SoftInstigate / restheart

Rapid API Development with MongoDB
https://restheart.org
GNU Affero General Public License v3.0
807 stars 171 forks source link

Regex with octal numeral doesn't work with accented characters #258

Closed STK913 closed 6 years ago

STK913 commented 6 years ago

Hello, regex with octal numeral doesn't work with accented characters, but it works with non-accented characters. A Mongo query works well.

Doesn't work (4 slashs because it's a parameter of the url) : http://www.domain.com?filter={'fieldName':{$regex:'(?i)^(Associ[\\\\351])$'}}

Works : http://www.domain.com?filter={'fieldName':{$regex:'(?i)^([\\\\300-\\\\306]ssocié)$'}} http://www.domain.com?filter={'fieldName':{$regex:'(?i)^(Associ\u00E9)$'}}

The octal numeral system is very useful for having ranges of values, for example [\350-\353] = [èéêë]. Do you know where this problem comes from and how to get around it? Thank you for your answers.

Environment : Restheart v3.2.0 + MongoDB v3.4

mkjsix commented 6 years ago

Hi @STK913

Could you write down in more details the kind of error messages you are receiving? You might also want to raise the log level to DEBUG in restheart configuration file. Besides, it would be helpful to have a test case we can use to reproduce the error ourselves.

Thank you

STK913 commented 6 years ago

Hello,

Edit: read my 3rd comment, because it surely comes from the backslash.

After doing additional tests, it seems that the octal system is not supported in regular expressions! I do not have any error in logs in debug mode:

09:35:41.607 [XNIO-1 task-1] ^[[34mINFO ^[[0;39m o.r.handlers.RequestLoggerHandler - GET http://XXX?filter={%27name%27:{$regex:%27(?i)^(Associ[\\\\351])$%27}} from /XXX =>$
09:35:41.829 [XNIO-1 task-2] ^[[34mINFO ^[[0;39m o.r.handlers.RequestLoggerHandler - GET http://XXX/favicon.ico from /XXX => status=^[[31;1m400^[[m elapsed=3ms contentLength=150 username=XXX role$

09:35:50.313 [XNIO-1 task-3] ^[[34mINFO ^[[0;39m o.r.handlers.RequestLoggerHandler - GET http://XXX?filter={%27name%27:{$regex:%27(?i)^(Associ%C3%A9)$%27}} from /XXX => st$
09:35:50.510 [XNIO-1 task-4] ^[[34mINFO ^[[0;39m o.r.handlers.RequestLoggerHandler - GET http://XXX/favicon.ico from /XXX => status=^[[31;1m400^[[m elapsed=2ms contentLength=150 username=XXX role$

Here is the procedure to reproduce the problem:

http://XXX/MyRepository/MyCollection?filter={"name":{$regex:"^Associé$"}} Response : {"_embedded":[{"_id":{"$oid":"5a339970966079cd784273d8"},"name":"Associé"}],"_id":"MyCollection","_returned":1} Expected value : YES

http://XXX/MyRepository/MyCollection?filter={"name":{$regex:"^[\\\\0-\\\\0]ssocié$"}} Response : {"_embedded":[{"_id":{"$oid":"5a339970966079cd784273d8"},"name":"Associé"}],"_id":"MyCollection","_returned":1} Expected value : NO, it does not make sense, no matter the value range [\0-\0], the document will always be returned

http://XXX/MyRepository/MyCollection?filter={"name":{$regex:"^[\\\\101]ssocié$"}} Response : {"_embedded":[],"_id":"MyCollection","_returned":0} Expected value : NO, the document is not recovered

http://XXX/MyRepository/MyCollection?filter={"name":{$regex:"^[\\\\0-\\\\0]ssEcié$"}} Response : {"_embedded":[],"_id":"MyCollection","_returned":0} Expected value : YES, because I changed a character

STK913 commented 6 years ago

And I confirm that the octal system works well on MongoDB with Robo 3T:

db.getCollection('MyCollection').find({"name":{$regex:/[\100-\102]ssocié/gi}}) db.getCollection('MyCollection').find({"name":{$regex:/[\101]ssocié/gi}}) -> Returns the document

db.getCollection('MyCollection').find({"name":{$regex:/[\102-\104]ssocié/gi}}) db.getCollection('MyCollection').find({"name":{$regex:/[\102]ssocié/gi}}) -> Do not return any documents, but it's normal because \101 = A

Does this come from the backslash ?

STK913 commented 6 years ago

I thought it was necessary to put several backslashes for encoding reasons, but finally the problem certainly comes from that!

With a backslash : http://XXX/MyRepositoryy/MyCollection?filter={"name":{$regex:"^([\101]ssocié)$"}} Returned value : {"_exceptions":[{"exception":"org.bson.json.JsonParseException","exception message":"Invalid escape sequence in JSON string '\\1'."}],"http status code":500,"http status description":"Internal Server Error","message":"Error handling the request, see log for more information"}

Exception :

restheart      | 10:34:10.498 [XNIO-1 task-1] ERROR org.restheart.handlers.ErrorHandler - Error handling the request
restheart      | org.bson.json.JsonParseException: Invalid escape sequence in JSON string '\1'.
restheart      |        at org.bson.json.JsonScanner.scanString(JsonScanner.java:515)
restheart      |        at org.bson.json.JsonScanner.nextToken(JsonScanner.java:98)
restheart      |        at org.bson.json.JsonReader.popToken(JsonReader.java:478)
restheart      |        at org.bson.json.JsonReader.visitRegularExpressionExtendedJson(JsonReader.java:964)
restheart      |        at org.bson.json.JsonReader.visitExtendedJSON(JsonReader.java:593)
restheart      |        at org.bson.json.JsonReader.readBsonType(JsonReader.java:145)
restheart      |        at org.bson.codecs.BsonDocumentCodec.decode(BsonDocumentCodec.java:82)
restheart      |        at org.bson.BsonDocument.parse(BsonDocument.java:62)
restheart      |        at org.restheart.handlers.RequestContext.getFiltersDocument(RequestContext.java:621)
restheart      |        at org.restheart.handlers.collection.GetCollectionHandler.handleRequest(GetCollectionHandler.java:97)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.handlers.metadata.AbstractTransformerMetadataHandler.handleRequest(AbstractTransformerMetadataHandler.java:62)
restheart      |        at org.restheart.handlers.RequestDispacherHandler.handleRequest(RequestDispacherHandler.java:468)
restheart      |        at org.restheart.handlers.injectors.CollectionPropsInjectorHandler.handleRequest(CollectionPropsInjectorHandler.java:106)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.handlers.injectors.DbPropsInjectorHandler.handleRequest(DbPropsInjectorHandler.java:92)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.handlers.injectors.AccountInjectorHandler.handleRequest(AccountInjectorHandler.java:56)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.security.handlers.AccessManagerHandler.handleRequest(AccessManagerHandler.java:54)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.security.handlers.AuthTokenInjecterHandler.handleRequest(AuthTokenInjecterHandler.java:70)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.security.handlers.AuthenticationCallHandler.handleRequest(AuthenticationCallHandler.java:54)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.security.handlers.AuthenticationConstraintHandler.handleRequest(AuthenticationConstraintHandler.java:55)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.security.handlers.AuthenticationMechanismsHandler.handleRequest(AuthenticationMechanismsHandler.java:61)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.security.handlers.SecurityInitialHandler.handleRequest(SecurityInitialHandler.java:96)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.security.handlers.SecurityHandler.handleRequest(SecurityHandler.java:69)
restheart      |        at org.restheart.security.handlers.SecurityHandlerDispacher.handleRequest(SecurityHandlerDispacher.java:60)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.handlers.injectors.BodyInjectorHandler.handleRequest(BodyInjectorHandler.java:318)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.handlers.OptionsHandler.handleRequest(OptionsHandler.java:58)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.handlers.injectors.RequestContextInjectorHandler.handleRequest(RequestContextInjectorHandler.java:647)
restheart      |        at org.restheart.handlers.PipedHttpHandler.next(PipedHttpHandler.java:115)
restheart      |        at org.restheart.security.handlers.CORSHandler.handleRequest(CORSHandler.java:88)
restheart      |        at org.restheart.handlers.RequestLoggerHandler.handleRequest(RequestLoggerHandler.java:84)
restheart      |        at org.restheart.handlers.PipedHttpHandler.handleRequest(PipedHttpHandler.java:96)
restheart      |        at io.undertow.server.handlers.PathHandler.handleRequest(PathHandler.java:94)
restheart      |        at io.undertow.server.handlers.HttpContinueAcceptingHandler.handleRequest(HttpContinueAcceptingHandler.java:78)
restheart      |        at org.restheart.handlers.ErrorHandler.handleRequest(ErrorHandler.java:70)
restheart      |        at io.undertow.server.handlers.encoding.EncodingHandler.handleRequest(EncodingHandler.java:72)
restheart      |        at org.restheart.handlers.GzipEncodingHandler.handleRequest(GzipEncodingHandler.java:75)
restheart      |        at io.undertow.server.Connectors.executeRootHandler(Connectors.java:210)
restheart      |        at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:809)
restheart      |        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
restheart      |        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
restheart      |        at java.lang.Thread.run(Thread.java:745)

With two backslashs : http://XXX/MyRepositoryy/MyCollection?filter={"name":{$regex:"^([\\101]ssocié)$"}} Returned value : {"_exceptions":[{"exception":"java.util.regex.PatternSyntaxException","exception message":"Illegal/unsupported escape sequence near index 4\n^([\\101]ssocié)$\n ^"}],"http status code":400,"http status description":"Bad Request","message":"illegal filter paramenter: {'name':{$regex:'^([\\\\101]ssocié)$'}}"} No exception in the logs.

mkjsix commented 6 years ago

The first exception org.bson.json.JsonParseException is thrown by the mongodb Java driver, while the second java.util.regex.PatternSyntaxException might be resolved by using 4 backslashes instead of 2.

STK913 commented 6 years ago

In this case, it is necessary to refer to the 2 other messages, above.

mkjsix commented 6 years ago

Accordingly to: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

Summary of regular-expression constructs:

\0n | The character with octal value 0n (0 <= n <= 7)
\0nn | The character with **octal** value 0nn (0 <= n <= 7)
\0mnn | The character with **octal** value 0mnn (0 <= m <= 3, 0 <= n <= 7)

So octal sequences needs an initial 0 (zero).

See also: https://stackoverflow.com/questions/38749356/java-regex-why-is-177-escape-code-invalid

STK913 commented 6 years ago

Hello,

Thank you for your answer, however, I do not use Java! I'm using the Restheart API (GET requests) and the problem is still existing with or without 0. In addition, the bug is not present with MongoDB queries.

This bug is annoying because I have to delete all the accents in the MongoDB database if I want to recover the data with Restheart.

Can you reopen the bug please?

mkjsix commented 6 years ago

This has nothing to do with Java, it's how you write the filter. If your filter has a regex which contains octal characters, those must be escaped as described above.