TBD54566975 / dwn-sql-store

Apache License 2.0
8 stars 6 forks source link

Release v0.2.10 #20

Closed frankhinek closed 8 months ago

frankhinek commented 9 months ago

This PR will:

codecov[bot] commented 9 months ago

Codecov Report

Merging #20 (53372be) into main (4c07041) will not change coverage. The diff coverage is n/a.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #20 +/- ## ======================================= Coverage 80.36% 80.36% ======================================= Files 10 10 Lines 988 988 Branches 137 137 ======================================= Hits 794 794 Misses 194 194 ```
LiranCohen commented 9 months ago

Originally thought this was only a UTF-8 setting, but it's a different obscure setting wrt sorting:

LC_COLLATE affects comparisons between strings. In practice, the most visible effect is the sort order. LC_COLLATE='C' (or POSIX which is a synonym) means that it's the byte order that drives comparisons, whereas a locale in the language_REGION form means that cultural rules will drive the comparisons. https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database

Seems that the other "locale" settings can be changed after initializing the DB:

To alter the default collation order or character set classes, use the --lc-collate and --lc-ctype options. Collation orders other than C or POSIX also have a performance penalty. For these reasons it is important to choose the right locale when running initdb.

The remaining locale categories can be changed later when the server is started. You can also use --locale to set the default for all locale categories, including collation order and character set classes. https://www.postgresql.org/docs/13/app-initdb.html

This is caused by the recent addition for string prefix filtering.

  static constructPrefixFilterAsRangeFilter(prefix: string): RangeFilter {
    return {
      gte : prefix,
      lt  : prefix + '\uffff',
    };
  }

The \uffff is added to the end of the prefix so that the upper bounds will not go beyond the prefix. \uffff is a character that's sorted last lexicographically.