atviriduomenys / spinta

Spinta is a framework to describe, extract and publish data (a DEP Framework).
MIT License
10 stars 4 forks source link

Type casting on spinta migrate #580

Closed sirex closed 2 months ago

sirex commented 3 months ago

Currently spinta migrate does not support type casting.

For example if we previously had string type, and then type was changed to integer, when running spinta migrate we will get:

ProgrammingError: (psycopg2.errors.DatatypeMismatch) column "metai" cannot be cast automatically to type integer
HINT:  You might need to specify "USING metai::integer".
[SQL: ALTER TABLE "datasets/gov/lsd/gyventoju_uzimtumas/GyvUzimtumas" ALTER COLUMN metai TYPE INTEGER ]

Same applies to other cases, when type changes.

List of type conversions we need to support:

When converting references, here is an example, how this was done manually:

-- Check if adding constraint is possible, if t table has references to s that
-- do not exists in s, then do nothing and raise an giving a sample of
-- refereces that do not exist in s.
SELECT s._id
FROM "datasets/gov/lsd/sdg/KaimoGyventojas2kmNuoKelio" t
LEFT JOIN "datasets/gov/lsd/grid/Grid1KmSq" s
WHERE t.grid_id IS NOT NULL
  AND s.grid_id IS NULL
;

-- Add new level:4 ref column
ALTER TABLE "datasets/gov/lsd/sdg/KaimoGyventojas2kmNuoKelio"
ADD COLUMN "grid_id._id" UUID;

-- If all references from t to s exists in s, then first update local
-- identifiers with _id
UPDATE "datasets/gov/lsd/sdg/KaimoGyventojas2kmNuoKelio" t
SET "grid_id._id" = (
    SELECT s._id
    FROM "datasets/gov/lsd/grid/Grid1KmSq" s
    WHERE t.grid_id = s.grid_id
);

-- Add constraint
ALTER TABLE  "datasets/gov/lsd/sdg/KaimoGyventojas2kmNuoKelio"
ADD CONSTRAINT "fk_datasets/gov/lsd/sdg/KaimoGyventojas2kmNuoKelio_grid_id._id"
FOREIGN KEY ("grid_id._id")
REFERENCES "datasets/gov/lsd/grid/Grid1KmSq" (_id);

Example with geometry SRID change:

ALTER TABLE "datasets/gov/lhmt/klimatologija/MeteorologineAikstele"
    ALTER COLUMN "koord"
        TYPE geometry(Point, 4326)
        USING ST_SetSRID("koord", 4326);
sirex commented 3 months ago

Migrating from string to text

Regarding text type, currently there are a lot of cases like this:

d | r | b | m | property | type    | ref
dataset1                 |         |
  |   |   | Country      |         | id
  |   |   |   | id       | integer |
  |   |   |   | name_lt  | string  |
  |   |   |   | name_en  | string  |

Because for a long time, text type was not supported. Now since text is introduced, there will be a lot of changes, for manifest table above, into the following:

d | r | b | m | property | type    | ref
dataset1                 |         |
  |   |   | Country      |         | id
  |   |   |   | id       | integer |
  |   |   |   | name@lt  | string  |
  |   |   |   | name@en  | string  |

In order to migrate this change we need to do:

  1. Add new JSONB column name.
  2. Copy data from name_lt and name_en into name[lt] and name[en].
  3. Rename name_lt and name_en into __name_lt and __name_en.

Migrating from scalar to ref

When migrating from

d | r | b | m | property | type    | ref | level
dataset1                 |         |     |
  |   |   | Country      |         | id  | 4
  |   |   |   | id       | integer |     | 4
  |   |   | City         |         | id  | 4
  |   |   |   | id       | integer |     | 4
  |   |   |   | country  | integer |     | 2

to

d | r | b | m | property | type    | ref     | level
dataset1                 |         |         |
  |   |   | Country      |         | id      | 4
  |   |   |   | id       | integer |         | 4
  |   |   | City         |         | id      | 4
  |   |   |   | id       | integer |         | 4
  |   |   |   | country  | ref     | Country | 4

We assume, that City.country connects to Country via primary key Country.id, thus migrations should be done as follows:

  1. Check reference integrity, by left joining City with Country on City.country = Country.id if there are reference from City.country that does not exists in Country.id, then migration should be abborted, providing clear error message with few samples of City.country, that does not exists in Country.id.
  2. Add new UUID column country._id.
  3. Copy Country._id into City.country._id by joining City and Country using City.country = Country.id.
  4. Add foreign key constraint on City.country._id.
  5. Rename City.country into City.__country.