Intevation / intelmq-certbund-contact

IntelMQ expert bots to lookup contact information in a database (part of the intelmq-cb-mailgen solution).
GNU Affero General Public License v3.0
3 stars 2 forks source link

Schema: consistently use "not null" constraints #4

Open bernhardreiter opened 7 years ago

bernhardreiter commented 7 years ago

535e6350507b23d46efd88c4231898119f8384fa https://github.com/Intevation/intelmq/blob/integrated/intelmq/bots/experts/certbund_contact/initdb.sql has

CREATE TABLE contact (
    id SERIAL PRIMARY KEY,

    firstname VARCHAR (500) NOT NULL DEFAULT '',
    lastname  VARCHAR (500) NOT NULL DEFAULT '',
    tel       VARCHAR (500) NOT NULL DEFAULT '',

but in organisation there are not such contraints for

    ripe_org_hdl VARCHAR(100),
    ti_handle    VARCHAR(500),
    first_handle    VARCHAR(500),

should be more consistent using NOT NULL DEFAULT ''

bernhardreiter commented 7 years ago

@dmth agreed that it should be more consistent. @bernhard-herzog?

aaronkaplan commented 7 years ago

Why can't the handle be NULL? I believe it can be. So, I believe it's ok the way it currently is.

bernhardreiter commented 7 years ago

It is a consistency issue: firstname has to be not NULL, but it can be the empty string. first_handle can be NULL or the empty string. But both should be consistent, as both could not be known.

aaronkaplan commented 7 years ago

we can also move the firstname to "NOT NULL" if you prefer that. But - I highly suggest you clarify that (domain specific ) question with the customer.

For us (CERTat) it makes sense to have NULL in ripe_org_hdl, ti_handle and first_handle. Because we might not even have a handle.

bernhardreiter commented 7 years ago

@aaronkaplan if it is clear that a field can be empty, the question is: how is this expressed in the database schema? Both variants above allow for the value to be absent. NOT NULL: default '' just makes sure that it has to be explicitely given.

Or do you at CERTat really need a difference between empty ('') and unknown (NULL)?

aaronkaplan commented 7 years ago

On 23 Feb 2017, at 14:34, Bernhard E. Reiter notifications@github.com wrote:

@aaronkaplan if it is clear that a field can be empty, the question is: how is this expressed in the database schema? Both variants above allow for the value to be absent. NOT NULL: default '' just makes sure that it has to be explicitely given.

Or do you at CERTat really need a difference between empty ('') and unknown (NULL)?

For me it's clear... '' is something else than NULL.

NULL means = I don't have any value, not even ''.

Hope it helps, a.

bernhardreiter commented 7 years ago

@aaronkaplan technically the difference is clear, but is a semantic distinction necessary for any use case? What does it tell your contactdb if you know that the ti_handle is '' as opposed to NULL? Both does means that you don't have a value.

aaronkaplan commented 7 years ago

On 23 Feb 2017, at 14:58, Bernhard E. Reiter notifications@github.com wrote:

@aaronkaplan technically the difference is clear, but is a semantic distinction necessary for any use case? What does it tell your contactdb if you know that the ti_handle is '' as opposed to NULL? Both does means that you don't have a value.

Semantically, for me at least, a NULL means "I don't know nothing about a RIPE NCC handle" for example. Personally, adding empty strings seems like a circumvention of NOT NULL constraints :)

bernhard-herzog commented 7 years ago

Semantically, for me at least, a NULL means "I don't know nothing about a RIPE NCC handle" for example.

You could ascribe the same meaning to an empty string, provided there's no other meaning for that. Also, there are other possible meanings for NULL, like "the maintainers of the DB have not yet entered this information", "RIPE does not know about the organization (therefore there cannot be a handle)".

At least some of this applies to empty strings as well, but how many of these cases do we need to distinguish? At the very least we have the situation that we do not have RIPE handles for all organisations so there are at least these cases:

Do we need to distinguish more? Which ones?

Generally, NULLs have at least two problems: