Open sasa1977 opened 6 years ago
It looks like the problem here is that projections such as toLower
and toUpper
work only on ASCII characters. See here for example.
This could be solved by emulating such functions, or alternatively we could map each document in mongodb with something like:
db.coll.find().map(function(doc){
doc.field = doc.field.toUpperCase();
return doc;
})
In the meantime, I think we shouldn't include mongodb in these tests (upper and lower).
cc @cristianberneanu
Unfortunately, mongo lacks severely in the provided functions department.
Either we ignore the issue for now, or, like you said, mark the functions as unsupported, forcing emulation. Although that will make the TeamBank queries even slower. We could also expose different Unicode and ASCII versions, but I am not sure it is worth it.
Although that will make the TeamBank queries even slower.
The problem is that currently these functions are working correctly only for ascii characters.
We could also expose different Unicode and ASCII versions, but I am not sure it is worth it.
I was thinking about this too, and I think it's an idea worth considering. Beyond mongodb, I've seen some other cases, where results of unicode functions such as upper or lower differ between different databases (see #2578). I think it's quite hard to ensure consistent behaviour here. The problem becomes worse if the same function is invoked as emulated in one part of the query, and not emulated in another.
So I wonder if we should expose consistent versions of these functions, such as nupper
and nlower
. These functions would always be emulated, and therefore we'd get consistent results. The documentation should explain the trade-off of performance vs correctness and consistency.
WDYT?
Let's also ping @obrok and @sebastian for additional thoughts here.
The problem is that currently these functions are working correctly only for ascii characters.
Is there still a bug if the all the customer's data is ASCII?
So I wonder if we should expose consistent versions of these functions, such as nupper and nlower.
I like this option
Is there still a bug if the all the customer's data is ASCII?
No, AFAIK, everything works fine on ascii characters.
Specialized functions that end up with emulation in those databases that don't handle it correctly out of the box sounds a bit like a cop-out, but also like a pragmatic and decent solution. In other words, I am for it.
This seems like a larger change. Moving it away from the current milestone.
@sebastian This is not in 19.3, but it seems like it would solve #3109 that is in there. Perhaps we should do this instead?
Yes, agreed. Adding to milestone.
@obrok & @cristianberneanu is this currently a problem at TeamBank? If not, should be moved down the road (which is painful since it's so old already).
I don't think they care. It is mostly needed so that are tests return consistent results between different backends. Moving to 19.4
Ok, thanks!
This should be closed or frozen, since we plan to drop MongoDB.
It appears that in mongodb 3.4 the capitalization is different from other databases, and incorrect: