CartoDB / cartodb-postgresql

PostgreSQL extension for CartoDB
BSD 3-Clause "New" or "Revised" License
111 stars 53 forks source link

Cartodbfy of exported and re-imported tables shall use existing columns #192

Open rafatower opened 8 years ago

rafatower commented 8 years ago

A typical use-case is exporting to CSV and then re-importing. It's been noted that for instance the_geom column (exported as WKB in the CSV) is not always taken as the good source for the new the_geom column when re-importing the csv file.

How to reproduce:

Expected: the new dataset looks like the original one

Actual: there's a backed up the_geom_ (or sth similar) and a new the_geom is created from lat/lng columns.

cc/ @saleiva

iriberri commented 8 years ago

Dup: https://github.com/CartoDB/cartodb/issues/6163

iriberri commented 8 years ago

Putting here the behaviour I saw yesterday:

When reimporting a exported table such as the one defined in the issue, aka:

the_geom cartodb_id lat long
WKB (point 1,1) 1 4 5

when passing through ogr2ogr which is converting lat and long to Multipoint because of the -nlt PROMOTE_TO_MULTI parameter, the table in the cdb_importer schema is:

                                           Table "cdb_importer.importer_02d80ca6"
    Column    |           Type            |                                                Modifiers                                                 
--------------+---------------------------+----------------------------------------------------------------------------------------------------------
 ogc_fid      | integer                   | not null default nextval('cdb_importer.importer_02d04f2cd0ca6_ogc_fid_seq'::regclass)
 cartodb_id   | integer                   | 
 the_geom     | character varying         | 
 long         | double precision          | 
 lat          | double precision          | 
 wkb_geometry | geometry(MultiPoint,4326) | 

with contents:


 ogc_fid | cartodb_id |                      the_geom                      | long | lat |                             wkb_geometry                             
---------+------------+----------------------------------------------------+------+-----+----------------------------------------------------------------------
       1 |          1 | 0101000020E6100000000000000000F03F000000000000F03F |    5 |   4 | 0104000020E6100000010000000101000000FFFFFFFFFFFF13400000000000001040
(1 row)

which after the CartoDBfication gets converted into:

   Table "public.untitled_table_22"
        Column        |          Type           |                               Modifiers                                
----------------------+-------------------------+------------------------------------------------------------------------
 cartodb_id           | integer                 | not null default nextval('untitled_table_22_cartodb_id_seq'::regclass)
 the_geom             | geometry(Geometry,4326) | 
 the_geom_webmercator | geometry(Geometry,3857) | 
 long                 | double precision        | 
 lat                  | double precision        | 

whose contents are:


 cartodb_id |                               the_geom                               |                         the_geom_webmercator                         | long | lat 
------------+----------------------------------------------------------------------+----------------------------------------------------------------------+------+-----
          1 | 0104000020E6100000010000000101000000FFFFFFFFFFFF13400000000000001040 | 0104000020110F000001000000010100000099476EE86AFC204160AB497020331B41 |    5 |   4
(1 row)

Which means that in the cartodbfication, the old varchar the_geom column is being discarded in favour of using the wkb_geometry that ogr2ogr generated (which happens to be a MultiPoint because of the command options).

iriberri commented 8 years ago

After some study last week, here are the final conclusions.

The CartoDBfication will always give precedence to a geometry(Geometry, 4326) column than to any other geometry in the table.

This scenario for lat/long columns would only happen when guessing is enabled, and in this situation it could make sense and we would not consider it a bug. See https://github.com/CartoDB/cartodb/issues/6163#issuecomment-193178693.

There's still an issue that does not affect any regular CartoDB workflow: generating a table manually (SQL) with a the_geom column and another column with type geometry(Geometry, 4326) would use the other column instead of the_geom. But we can just say that this is unsupported, or that creating the the_geom column automatically must use geometry(Geometry, 4326).