GMOD / Chado

the GMOD database schema
http://gmod.org/wiki/Chado
Artistic License 2.0
36 stars 25 forks source link

Property Table Updates #30

Open spficklin opened 6 years ago

spficklin commented 6 years ago

This issue imported from the Chado v1.4 requested changes google doc:

http://gmod.827538.n3.nabble.com/Property-tables-and-units-td4047268.html#a4047297

How to deal with numerical values? Otherwise can’t naively sort or filter values in a query. To solve table property issues, I (Valentin Guignon, Bioversity International) propose to the Chado community to debate and vote on the 5 flavors to keep only one of them. Flavors:

1) replacing existing table by a new table with "typed" properties Pros: keeps the same schema with minor changes on the 'prop tables Cons: some old software may "not like" to see new columns and would need to be updated. Values stored in non-text value fields won’t be seen by old softwares (while they would be seen with 2b and 3b).

2) adding a new table with "typed" properties and keep the old one Pros: backward compatibility… Cons: ...but properties will come from 2 different tables and conflicting values could appear. Double table management.

3) adding a new table with "typed" properties and remove the old one Pros: forces people to be aware and use the new way of storing properties Cons: no backward compatibility 2b) same as 2 with triggers to auto-synchronize the 2 tables Pros: backward compatibility Cons: double data storage, working with the old property table may have side effect issues. 3b) same as 3 with a view and rules in order to have virtual old property tables that would behave just like before but store data in the new table Pros: backward compatibility Cons: inserting/updating through the view replacing the old property tables may have side effect issues.

spficklin commented 6 years ago

Sample code as imported from the Google Doc mentioned in the previous comment:

-- FLAVOR 1) with an existing database
ALTER TABLE [% TABLENAME %]prop
  ADD COLUMN int_value BIGINT NULL DEFAULT NULL,
  ADD COLUMN float_value DOUBLE PRECISION NULL DEFAULT NULL,
  ADD COLUMN time_value TIMESTAMP NULL DEFAULT NULL,
  ADD COLUMN cvterm_id_value INTEGER NULL DEFAULT NULL REFERENCES cvterm
    ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED,
  ADD COLUMN compose_order INTEGER NOT NULL DEFAULT 0,
  ADD COLUMN new_rank INTEGER NOT NULL DEFAULT 0,
  ADD UNIQUE([% TABLENAME %]_id, type_id, new_rank, compose_order),
  ADD CHECK(1 =
    ( CAST(value IS NOT NULL AS INTEGER)
    + CAST(int_value IS NOT NULL AS INTEGER)
    + CAST(float_value IS NOT NULL AS INTEGER)
    + CAST(time_value IS NOT NULL AS INTEGER)
    + CAST(cvterm_id_value IS NOT NULL AS INTEGER)
    )
  ),
  DROP CONSTRAINT IF EXISTS stockprop_c1  
;
UPDATE [% TABLENAME %]prop SET new_rank = rank;
ALTER TABLE [% TABLENAME %]prop DROP COLUMN rank;
ALTER TABLE [% TABLENAME %]prop RENAME COLUMN new_rank TO rank;
-- ALL FLAVORS BUT 1 with an existing database
-- New typed-property (tprop) table creation
CREATE SEQUENCE [% TABLENAME %]tprop_[% TABLENAME %]tprop_id_seq;
CREATE TABLE [% TABLENAME %]tprop (
  [% TABLENAME %]tprop_id INTEGER NOT NULL DEFAULT NEXTVAL('[% TABLENAME %]tprop_[% TABLENAME %]tprop_id_seq'),
  [% TABLENAME %]_id INTEGER NOT NULL REFERENCES [% TABLENAME %]
    ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED,
  type_id INTEGER NOT NULL REFERENCES cvterm
    ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED,
  text_value TEXT NULL DEFAULT NULL,
  int_value BIGINT NULL DEFAULT NULL,
  float_value DOUBLE PRECISION NULL DEFAULT NULL,
  time_value TIMESTAMP NULL DEFAULT NULL,
  cvterm_id_value INTEGER NULL DEFAULT NULL REFERENCES cvterm
    ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED,
  compose_order INTEGER NOT NULL DEFAULT 0,
  rank INTEGER NOT NULL DEFAULT 0,
  -- we add compose_order in the constraint
  UNIQUE([% TABLENAME %]_id, type_id, rank, compose_order),
  -- rule to make sure only 1 value field at a time will be used
  CHECK(1 =
    ( CAST(text_value IS NOT NULL AS INTEGER)
    + CAST(int_value IS NOT NULL AS INTEGER)
    + CAST(float_value IS NOT NULL AS INTEGER)
    + CAST(time_value IS NOT NULL AS INTEGER)
    + CAST(cvterm_id_value IS NOT NULL AS INTEGER)
    )
  )
);
COMMENT ON TABLE [% TABLENAME %]tprop IS 'Fill only one value field for an entry and use compose_order field to group values together in a specific order. Use rank to store several values of the same type.';
-- ALL BUT FLAVOR 1: keep previous data
INSERT INTO [% TABLENAME %]tprop ([% TABLENAME %]_id, type_id, text_value, rank)
  SELECT ([% TABLENAME %]_id, type_id, value, rank) FROM [% TABLENAME %]prop;
-- FLAVORS 3) and 3b) only on existing databases:
DROP TABLE [% TABLENAME %]prop;
-- FALVOR 3b) only:
-- keep backward compatibility using views (no data duplicates)
-- View creation
CREATE OR REPLACE VIEW [% TABLENAME %]prop (
    [% TABLENAME %]prop_id,
    [% TABLENAME %]_id,
    type_id,
    value,
    rank
  ) AS
  SELECT
    min(p.[% TABLENAME %]tprop_id),
    p.[% TABLENAME %]_id,
    p.type_id,
    string_agg(
    COALESCE(
        p.text_value,
        CAST(p.int_value AS TEXT),
        CAST(p.float_value AS TEXT),
        CAST(p.time_value AS TEXT),
        cvt.name
    ),
    ' '
    ORDER BY p.compose_order),
    p.rank
  FROM [% TABLENAME %]tprop p
    LEFT JOIN cvterm cvt ON cvt.cvterm_id = p.cvterm_id_value
  GROUP BY p.[% TABLENAME %]_id, p.type_id, p.rank
;
-- Manage inserts on the view
CREATE RULE [% TABLENAME %]prop_insert AS
ON INSERT TO [% TABLENAME %]prop
DO INSTEAD
  INSERT INTO [% TABLENAME %]tprop VALUES (
    COALESCE(NEW.[% TABLENAME %]prop_id, NEXTVAL('[% TABLENAME %]tprop_[% TABLENAME %]tprop_id_seq')),
    NEW.[% TABLENAME %]_id,
    NEW.type_id,
    NEW.value,
    NULL,
    NULL,
    NULL,
    NULL,
    DEFAULT,
    COALESCE(NEW.rank, 0)
  )
;
-- Manage updates on the view
CREATE RULE [% TABLENAME %]prop_update AS
ON UPDATE TO [% TABLENAME %]prop
DO INSTEAD
  UPDATE [% TABLENAME %]tprop SET
    [% TABLENAME %]tprop_id = NEW.[% TABLENAME %]prop_id,
    [% TABLENAME %]_id      = NEW.[% TABLENAME %]_id,
    type_id         = NEW.type_id,
    text_value      = NEW.value,
    int_value       = NULL,
    float_value     = NULL,
    time_value      = NULL,
    cvterm_id_value     = NULL,
    rank            = NEW.rank
  WHERE
    [% TABLENAME %]_id = OLD.[% TABLENAME %]_id
    AND type_id = OLD.type_id
    AND rank = OLD.rank
;
-- Manage deletes on the view
CREATE RULE [% TABLENAME %]prop_delete AS
ON UPDATE TO [% TABLENAME %]prop
DO INSTEAD
  DELETE FROM [% TABLENAME %]tprop
  WHERE
    [% TABLENAME %]_id = OLD.[% TABLENAME %]_id
    AND type_id = OLD.type_id
    AND rank = OLD.rank
;
-- FALVOR 2b) only:
-- backward compatibility keeping old table with triggers (to duplicate data)
-- Insert trigger
CREATE OR REPLACE FUNCTION [% TABLENAME %]prop_to_[% TABLENAME %]tprop_insert() RETURNS TRIGGER AS
$$
  BEGIN
    INSERT INTO [% TABLENAME %]tprop VALUES (
    COALESCE(NEW.[% TABLENAME %]prop_id, NEXTVAL('[% TABLENAME %]tprop_[% TABLENAME %]tprop_id_seq')),
    NEW.[% TABLENAME %]_id,
    NEW.type_id,
    NEW.value,
    NULL,
    NULL,
    NULL,
    NULL,
    DEFAULT,
    COALESCE(NEW.rank, 0)
    );
    RETURN NEW;
  END
$$
LANGUAGE plpgsql;
CREATE TRIGGER [% TABLENAME %]prop_to_[% TABLENAME %]tprop_insert_trigger
  AFTER INSERT ON [% TABLENAME %]prop
  FOR EACH ROW
  EXECUTE PROCEDURE [% TABLENAME %]prop_to_[% TABLENAME %]tprop_insert();
-- Update trigger
CREATE OR REPLACE FUNCTION [% TABLENAME %]prop_to_[% TABLENAME %]tprop_update() RETURNS TRIGGER AS
$$
  BEGIN
    UPDATE [% TABLENAME %]tprop SET
    [% TABLENAME %]tprop_id = NEW.[% TABLENAME %]prop_id,
    [% TABLENAME %]_id      = NEW.[% TABLENAME %]_id,
    type_id         = NEW.type_id,
    text_value      = NEW.value,
    int_value       = NULL,
    float_value     = NULL,
    time_value      = NULL,
    cvterm_id_value     = NULL,
    rank            = NEW.rank
    WHERE
    [% TABLENAME %]_id = OLD.[% TABLENAME %]_id
    AND type_id = OLD.type_id
    AND rank = OLD.rank
    ;
    RETURN NEW;
  END
$$
LANGUAGE plpgsql;
CREATE TRIGGER [% TABLENAME %]prop_to_[% TABLENAME %]tprop_update_trigger
  AFTER UPDATE ON [% TABLENAME %]prop
  FOR EACH ROW
  EXECUTE PROCEDURE [% TABLENAME %]prop_to_[% TABLENAME %]tprop_update();
-- Delete trigger
CREATE OR REPLACE FUNCTION [% TABLENAME %]prop_to_[% TABLENAME %]tprop_delete() RETURNS TRIGGER AS
$$
  BEGIN
    DELETE FROM [% TABLENAME %]tprop
    WHERE
    [% TABLENAME %]_id = OLD.[% TABLENAME %]_id
    AND type_id = OLD.type_id
    AND rank = OLD.rank
    ;
    RETURN OLD;
  END
$$
LANGUAGE plpgsql;
CREATE TRIGGER [% TABLENAME %]prop_to_[% TABLENAME %]tprop_delete_trigger
  AFTER DELETE ON [% TABLENAME %]prop
  FOR EACH ROW
  EXECUTE PROCEDURE [% TABLENAME %]prop_to_[% TABLENAME %]tprop_delete();
-- Truncate trigger
CREATE OR REPLACE FUNCTION [% TABLENAME %]prop_to_[% TABLENAME %]tprop_truncate() RETURNS TRIGGER AS
$$
  BEGIN
    TRUNCATE [% TABLENAME %]tprop;
    RETURN NULL;
  END
$$
LANGUAGE plpgsql;
CREATE TRIGGER [% TABLENAME %]prop_to_[% TABLENAME %]tprop_truncate_trigger
  AFTER TRUNCATE ON [% TABLENAME %]prop
  FOR EACH STATEMENT
  EXECUTE PROCEDURE [% TABLENAME %]prop_to_[% TABLENAME %]tprop_truncate();
laceysanderson commented 6 years ago

You can achieve a "natural" sort for the property tables using the following approach

SELECT * 
  FROM featureprop 
  WHERE type_id = [property you know only contains numbers] 
  ORDER BY value::int DESC;

Caveat: you must ensure all values actually are integers ;-)

I'm sure I'm for any of Valentin's flavours.

One other thing to note, the linked mailing list thread actually covered two issues: 1) how to record units for the prop value (e.g. weight = 15 kg) and 2) Valentin's "typed" values.

guignonv commented 6 years ago

About the "how to record units for the prop value", I think it's the purpose of the type_id of that value, which should be using an ontology "variable" (which includes by definition a trait, a method and a unit).

spficklin commented 2 years ago

The following was discussed in our Tripal/Chado meeting this morning.

Problems with current suggestions

  1. By adding new columns for numerical, dates, cvterm_id it allows a user to add multiple values for the same property type which makes interpretation hard.
  2. Adding multiple value columns may break the normalized database design.

Other ideas to store different value types:

  1. Keep the value column as the only value column but change the it to support JSONB which is indexable by newer versions of PostgreSQL. This way folks can store whatever they need. Although, this locks Chado even further into PostgreSQL (may not be a problem)
  2. Create different value tables... one for each data type. Although, this would bloat Chado tremendously.