LI3DS / pg-li3ds

PostgreSQL extension for managing 3D sensor data
MIT License
11 stars 2 forks source link

Improve the create_project function #21

Closed elemoine closed 7 years ago

elemoine commented 7 years ago

This brings a number of improvements to the create_project function.

@ldgeo, can you please take a look at this? I'd like to understand why create_project did not create "route" and "lidar" tables previously. Was there a good reason?

ldgeo commented 7 years ago

@elemoine i'm not a big fan of allowing schema names with uppercase, this could lead to confusion if we could create a "Paris" project next to a "paris" project, and we always need to escape identifiers.

We didn't created route and lidar table because they were initialy materialized views on foreign data wrappers, they need some additionnal informations ex below :

drop foreign table if exists toulouse.route_fdw cascade;
create foreign table toulouse.route_fdw (
    points pcpatch(1)
)
server route_server
 options (
    sources '/data/sbet_dir/toulouse
    , patch_size '100'
    , pcid '1'
    , time_offset '1381017584'
);

drop materialized view if exists toulouse.route;
create materialized view toulouse.route as
    select
        row_number() over () as id
        , f.points
        , timestamptz 'epoch' + pc_patchmin(f.points, 'm_time') * INTERVAL '1 second' as start_time
        , timestamptz 'epoch' + pc_patchmax(f.points, 'm_time') * INTERVAL '1 second' as end_time
        , 2 as posdatasource
        -- , ajout box3d à indexer
    from toulouse.route_fdw f;

create index on toulouse.route using gist(geometry(points));
create index on toulouse.route using gist(tstzrange(start_time, end_time));

No objection to change this

elemoine commented 7 years ago

@elemoine i'm not a big fan of allowing schema names with uppercase, this could lead to confusion if we could create a "Paris" project next to a "paris" project, and we always need to escape identifiers.

I understand. But it's inconsistent and broken today, because the schema name is lowercase while the name of the project in the project table may include capital letters. For example, one may look up a project by its name in the project table, and decide to call create_project because there is no project record with that name. But create_project may fail because the schema already exists. So we need consistency. And I would find it strange not to allow project names with capital letters, just because of an implementation detail (schema names with capital letters require double-quoting). I can change my mind if both @mbredif and you agree not to use schema names with capital letters.

We didn't created route and lidar table because they were initialy materialized views on foreign data wrappers, they need some additionnal informations ex below

As I saw it <project_name>.lidar and <project_name>.route would just be references (URIs) to actual data files or tables. Just like <project_name>.image just provides links to images on the file system. I also understand that the indirection is not necessary if we alway use fdw for trajectory and lidar data. But in my opinion we also want/need to support, for example, las files on the file system. For now, I think I would not use fdw.

mbredif commented 7 years ago

+1 for case sensitivity and double quoting otherwise, the lowercase constraint must be documented and enforced with failure at creation, rather than silently applied.

I think we need an indirection table : a project may present multiple routes and multiple lidar point clouds with different schemas, which implies multiple tables that have to be registered in such an indirection table, in my opinion.

elemoine commented 7 years ago

I had a quick chat with @ldgeo. He is not a big fan of schema names including capital letters. I understand that. We could indeed have create_project use lower on the project name when inserting a project record into the project table, and use lower case letters everywhere. The issue is that the user code will also need to convert strings to lower case before making comparisons with project names in the database. So for that reason I still think allowing project names with upper case letters is a better option.

Also, I am wondering why we actually need to create a schema named after the project name. Instead why don't we just have the "image", "route" and "lidar" tables in the "li3ds" schema? If "image", "route" and "lidar" are just indirection tables to the actual data stores I don't see why we need project-specific schemas.

Now, about whether we need indirection tables. I agree with @mbredif. But this means that uri in datasource may be a directory name or a table name. And uri in lidar (or image or route) may be a file name (or directory name) or a patch column name. @mbredif, is this what you have in mind?

elemoine commented 7 years ago

Also, I am wondering why we actually need to create a schema named after the project name. Instead why don't we just have the "image", "route" and "lidar" tables in the "li3ds" schema? If "image", "route" and "lidar" are just indirection tables to the actual data stores I don't see why we need project-specific schemas.

I just added a commit that implements this.

elemoine commented 7 years ago

@ldgeo and I discussed this again. Our conclusion is to remove image, lidar and route and only use datasource as the indirection table. This eliminates the "case sensitivity" issue. And it makes things simpler, with just one indirection table.

This is what the table will look like:

create type datasource_type as enum (
    'image',
    'route',
    'lidar'
);

create table datasource(
    id serial primary key
    , uri varchar
    , type datasource_type not null
    , parameters jsonb
    , capture_time timestamptz
    , session int references session(id) on delete cascade not null
    , referential int references referential(id) on delete cascade not null
    , constraint uniqdatasource unique(uri, session, referential)
);

And we suggest schemes in uri, to be able to determine if the datasource refers to a file on the file system or a column of the database. Examples include file:/path/to/img.tif and column:public.paris.points.

capture_time makes sense for datasources that refer to image files. But it doesn't make sense for lidar data stored in a Postgres table. So it will just be NULL in the latter case.