Recommended way to do multiple column validation with custom error messages

AndresGuerraMax commented 8 years ago

Hi, I would like to know the suggested approach for doing validation with custom messages, in particular my use case is something like:

Suppose I have table:

CREATE TABLE person(
  id SERIAL PRIMARY KEY,
  phone_number TEXT CHECK (length(phone_number) < 20),
  full_name TEXT NOT NULL
);

When I do a post with this invalid data:

{
  "phone_number" : "23423423871262372193721871328129",
  "full_name" : ""
}

I get the following:

{
    "code": "23514", 
    "details": "Failing row contains (3, 23423423871262372193721871328129, John Doe).", 
    "hint": null, 
    "message": "new row for relation \"person\" violates check constraint \"person_phone_number_check\""
}

That's the error for just the phone_number column, ideally I would like to get is something like this in the message:

{
  "errors" : {
     "phone_number" : "Value must not exceed 20 characters long",
     "full_name" : "Value can't be empty"
  }
}

begriffs commented 8 years ago

The error format you describe would be great for the client side to show errors in form fields. I can see why you'd like to improve the error message. Let's think about how to do it.

The first challenge is that CHECK constraints can be for the table as a whole and involve more than one column at a time. As a first improvement however postgrest could catch error 23514 and look up information about the check constraint name from the information schema. Given the constraint_name we can look up the check_clause, which is the actual sql expression which failed. We can report this expression in the error message.

However this still does not find all failing checks at one time, but stops after the first one. The second improvement is that we can manually iterate the checks for the table and test their expressions against the failing row. Then report on all the failures at once.

The next shortcoming is pinpointing which column is at fault. As I mentioned this will not always be possible. However we could use the heuristic that if the check constraint has the auto-generated name [table]_[column]_check than we can parse the column out.

There may be a more direct way to get this information so anyone with greater postgresql knowledge please let me know.

steve-chavez commented 8 years ago

I'd like to share a workaround I did for this since I also dealt with this issue a while ago, now I'm doing this for all my tables. I do a table level constraint because if there's a failed column constraint the transaction gets aborted and further validations are not reached. Also in the check function I've raised an exception to abort the transaction and be able to get the custom message, for the above case the response and sql will be:

{
    "code": "P0001", 
    "details": null, 
    "hint": null, 
    "message": "{\"errors\" :{\"phone_number\":\"Value must not exceed 20 characters long\",\"full_name\":\"Value cant be empty\"}}"
}

With a JSON.parse of the message on the client the data can be worked, the sql is something like this:

CREATE TABLE person(
  id SERIAL PRIMARY KEY,
  phone_number TEXT,
  full_name TEXT,
  CONSTRAINT person_constraint CHECK (person_validation(phone_number, full_name))
); 

CREATE OR REPLACE FUNCTION person_validation(phone_number TEXT, full_name TEXT) RETURNS BOOL
LANGUAGE plpgsql AS $$
DECLARE 
    message TEXT;
    failed BOOL;
BEGIN
    message:= '';
    failed:= FALSE;

    IF char_length(phone_number) >= 20 THEN
        message:= message || '"phone_number":"Value must not exceed 20 characters long",';
        failed:=TRUE;
    END IF;

    IF coalesce(trim(full_name), '')='' THEN
        message:= message || '"full_name":"Value cant be empty",';
        failed:=TRUE;
    END IF;

    IF NOT failed THEN
        RETURN TRUE;
    ELSE 
        message := '{"errors" :{' || left(message, char_length(message) - 1) || '}}';
        RAISE EXCEPTION '%', message;
    END IF;
END;
$$;

The drawback is that I only use table level constraints now, can't do NOT NULLs, and I have to create validation functions for each one of my tables, maybe someone has a better approach that I'm missing.

calebmer commented 8 years ago

@SteveBash that's an interesting use case we should consider accommodating.

My personal opinion on this (which comes from trying to write/use many validation engines…) is that you should just duplicate validation logic in the client. Yes it's not DRY, but it allows us to develop a much better user experience with custom error messages and extra checks.

To aid with client side validation, try using the OPTIONS verb on your route. It will give some nice information your client can consume as a starting point for client side validation. Also, I'd like to consider making the OPTIONS request return a JSON Schema to make client side validation even easier.

begriffs commented 8 years ago

@calebmer can you elaborate on why it's better to duplicate logic? If the postgrest could offer detailed errors for all problematic input columns wouldn't that be more efficient and also keep the client up to date if data constraints on the server were strengthened or weakened?

calebmer commented 8 years ago

The reason I would duplicate validation logic is for purely UX reasons, not engineering reasons. In an engineering sense I completely agree that it would be helpful for PostgREST to output detailed error messages for all use cases, however when one is just going to show those database error messages to the user I think its better to duplicate the logic. It's much more helpful to see: "Oops! Looks like your title was too long, we don't allow titles less then 140 characters. A tweet's length!" then to see Error: Ivariant violation field "headline" failed test "length" with parameter 140 (a little excessive but I hope it illustrates my point).

Maybe I'm privileged to work on a team with good communication, but even if validation logic is not duplicated error messages should still be written on the client side, not in the database. That's why I prefer a JSON Schema approach, where the data is completely described in that way and validated on the client (or in PostgREST if we wanted).

begriffs commented 8 years ago

Good point about user facing error messages vs those for programmers. I wonder if the front end could translate the db errors to make them more friendly? There are some kinds of errors which if I understand correctly could not be specified with JSON Schema. For instance an endpoint which creates a new bank withdrawal transaction. The rule is that the new withdrawal cannot be so large as to make the account balance negative. The maximum amount for the field varies though depending on how much is in the account. The DB could enforce it pretty easily and give the client an error which the client could translate for the user. If the client had wanted to handle this situation it would need the foresight to request the total account balance before rendering the withdrawal form for the user.

calebmer commented 8 years ago

Good use case, well that brings us to the idea of error codes with some meta data. Maybe is there some way we could display all errors raised in a nice way? Or do we do that already. For instance in your example could the following statement be sufficient? RAISE EXCEPTION insufficient_funds USING no_more_then = $5

@AndresGuerraMax do any of these solutions/ideas solve your problem? Is there any interesting thing you'd like us to explore more?

ruslantalpa commented 8 years ago

@begriffs i agree with @calebmer on this, these are errors for programmers and not for users, their main purpose is to be informative, not "nice". Even if you somehow manage to have custom errors, this is all for nothing in systems that need multilanguage support, you still need to "translate" the message, you can't jsut output the message from the db. i.e. ... this should be closed :)

begriffs commented 8 years ago

As you point out customizing the error message text may not be important. However getting the full list of failures at once would be useful for client side programmers. It currently aborts after encountering the first failed constraint. Thus if you get an error you have to fix it and try again, only to potentially hit more errors. If you know all violations up front you can fix them all without having to play a guessing game.

How about restricting the scope of this issue to the following:

Customizing the error message is not allowed
Postgrest will not attempt to determine exactly which columns caused the problem
After encountering an error with create or update postgrest will test all table constraints against the user input and include a list of all that fail

I don't know exactly how hard this would be in the code, and might be way more complex than it is worth. Just wanted to discuss this option before closing the issue.

ruslantalpa commented 8 years ago

i have no idea how to implement that but i also believe we should not need to. The db constraints are there to keep the data integrity, not to offload input validation to the backend. In most cases, it's known the format/constraints on the columns so any good implementation should validate the data clientside before trying to send it to the backend.

begriffs commented 8 years ago

That's true for the most part. There are "check" constraints that verify conditions which involve multiple columns but the more common case is probably column constraints a client could infer from OPTIONS output for an endpoint.

I'll close this issue for now.

PostgREST / postgrest

Recommended way to do multiple column validation with custom error messages #362