PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
23.47k stars 1.03k forks source link

isdistinct doesn't work for null #2879

Open floitsch opened 1 year ago

floitsch commented 1 year ago

Environment

Description of issue

isdistinct doesn't work for null values. It is not possible to get all rows that do not have null in it.

Repro:

CREATE TABLE bug (
  id SERIAL PRIMARY KEY,
  data TEXT,
  b BOOLEAN
);

INSERT INTO bug (data, b)
  VALUES
    ('null', null),
    (null, false),
    ('foobar', null);
ᐅ curl 'http://localhost:49321/rest/v1/bug?data=isdistinct.null'
[{"id":2,"data":null,"b":false}, 
 {"id":3,"data":"foobar","b":null}]%                                                                                                                                                  

ᐅ curl 'http://localhost:49321/rest/v1/bug?b=isdistinct.null'
{"code":"22P02","details":null,"hint":null,"message":"invalid input syntax for type boolean: \"null\""}%                                                                              

As can be seen in the example: the value that is passed to isdistinct must be the type of the column, and if the column is of type string, then the isdistinct treats it like the string 'null' and not the null value.

Expected behavior: isdistinct behaves similar to IS DISTINCT FROM (as the documentation states) and thus:

laurenceisla commented 1 year ago

Treating the null value in the query string as a text string is a current limitation for isdistinct. It also happens for eq, like and similar comparison operators. Only is treats the value as null IIRC.

I'm not so sure about treating only the quoted "null" as a string, because other strings don't behave like that currently, e.g. isdistinct="value" is the same as IS DISTINCT FROM '"value"' (it keeps the double quote).

We would need to get to an agreement on the syntax or if it should be implemented, because using not.is.null as an alternative is available.

steve-chavez commented 1 year ago

curl 'http://localhost:49321/rest/v1/bug?b=isdistinct.null'

A boolean looks more fit for is.unknown (three-valued logic):

curl 'http://localhost:49321/rest/v1/bug?b=is.unknown'
floitsch commented 1 year ago

I'm fine either way, as long as the documentation is updated.

Note that quotes already dropped inside an or (or similar construct): or=(data.eq."foo") is the same as or=(data.eq.foo)

(btw: is there a good place to find all these rules and maybe some test-cases, so that library writers can make sure they get all these cases correct?)

laurenceisla commented 1 year ago

Note that quotes already dropped inside an or (or similar construct): or=(data.eq."foo") is the same as or=(data.eq.foo)

That's interesting, then maybe we aren't being completely consistent in parsing operators, or maybe there's something special that I'm missing for logical operators.

is there a good place to find all these rules and maybe some test-cases, so that library writers can make sure they get all these cases correct?

For now, this is documented in the Horizontal Filtering and the Working With PG Types sections in the docs. The tests we use for Filters could also be useful: https://github.com/PostgREST/postgrest/blob/40c2bcd4a1f63b9d639574a07202a1edf9edc4b0/test/spec/Feature/Query/QuerySpec.hs#L31-L335

floitsch commented 1 year ago

For now, this is documented in the Horizontal Filtering and the Working With PG Types sections in the docs. The tests we use for Filters could also be useful[.]

Thanks. Maybe I missed it, but I was mostly looking at how to escape input data correctly. Experimentally, I found that I'm allowed (and sometimes required) to quote strings if they are nested. But I'm not allowed to do that if they aren't.

When quoted, then I need to escape any \ and ", but I don't escape any other character.

I also need to quote column names that are equal to "or" or "and", and I do quote some other column names as well. However, I'm not completely sure what the rules for column names are. I wasn't able to access a column named ". (To be fair, I didn't manage that with pure SQL either, but Postgresql does apparently allow it).

Here is my encoding function: https://github.com/toitware/toit-supabase/blob/06f434e8e35705063965320639c0d8299a95c995/src/filter.toit#L305 (and the encode_column just below).

Here are my test-cases for some string values: https://github.com/toitware/toit-supabase/blob/06f434e8e35705063965320639c0d8299a95c995/tests/supabase_filters_test.toit#L49

steve-chavez commented 1 year ago

Finally got to understand this issue. So the problem is that:

col=isdistinct.null

Translates to:

col IS DISTINCT FROM 'null'

And it's expected to translate to

col IS DISTINCT FROM NULL

Which is reasonable. Though as Laurence mentions, it's possible to get the same behavior with not.is.null.

Expected behavior: isdistinct behaves similar to IS DISTINCT FROM (as the documentation states) and thus: isdistinct.null compares to the null value.

Seems possible to get that behavior by doing:

col is distinct from nullif('null', 'null')::col_type;

But not sure if we should do it or we should just document the limitation.

wolfgangwalther commented 10 months ago

I don't think we should change anything (except maybe for better documentation). The rules are simple. If you want to compare the column data with some value passed via query string, you will need to:

not.isdistinct is only useful when you compare two columns, but we don't allow that. As soon as you compare against a constant, which we always do, you can choose the operator accordingly.

wolfgangwalther commented 10 months ago

Expected behavior: isdistinct behaves similar to IS DISTINCT FROM (as the documentation states) and thus:

isdistinct.null compares to the null value. isdistinct."null" looks for the 'null' string.

Before we could do anything like that, we would need to clean up the whole quoting mess, as you already wrote in the other comments. See also: https://github.com/PostgREST/postgrest/issues/1943