biocore / american-gut-web

The website for the American Gut Project participant portal
BSD 3-Clause "New" or "Revised" License
5 stars 24 forks source link

Sources #671

Open sjanssen2 opened 7 years ago

sjanssen2 commented 7 years ago

Hi, in order to model the "source" concept in our DB, I suggest to add in the following two tables.

A source should have a unique, automatically generated ID, the user can give it a name (in the case of a human source that is the participant_name) and it is owned by one ag_login_id.

CREATE TABLE ag."source" (
    source_id            uuid DEFAULT ag.uuid_generate_v4() ,
    source_name          varchar(100)  ,
    ag_login_id          uuid  ,
    CONSTRAINT pk_source UNIQUE ( source_id )
 );

The second table, stores the many-to-many relation between barcodes and surveys, where each of those connections points to one source, hence the source_id column.

CREATE TABLE ag.source_barcodes_surveys (
    source_id            uuid  NOT NULL,
    barcode              varchar  ,
    survey_id            varchar
 );

This design should allow for the creation of empty sources by only giving the source a name. Later, the user might fill out surveys for this source. And in a third step, the user can assign barcodes to a source (we should ensure that this assignment is only possible if the source has been assigned to a main survey, which is our way to ensure consent).

@josenavas @wasade please let me know your thoughts about this design.

sjanssen2 commented 7 years ago

The full SQL code to insert the two tables is:

CREATE TABLE ag."source" (
    source_id            uuid DEFAULT ag.uuid_generate_v4() ,
    source_name          varchar(100)  ,
    ag_login_id          uuid  ,
    CONSTRAINT pk_source UNIQUE ( source_id )
 );

CREATE INDEX idx_source_1 ON ag."source" ( ag_login_id );

COMMENT ON TABLE ag."source" IS 'A "source" is the human, animal or environment that was swapped for microbial analysis.';
COMMENT ON COLUMN ag."source".source_id IS 'A unique ID to identify the source.';
COMMENT ON COLUMN ag."source".source_name IS 'Sources can come with names, e.g. names for the humans that have been swapped.';
COMMENT ON COLUMN ag."source".ag_login_id IS 'Points to the "user" that owns this source.';

CREATE TABLE ag.source_barcodes_surveys (
    source_id            uuid  NOT NULL,
    barcode              varchar  ,
    survey_id            varchar
 );

CREATE INDEX idx_source ON ag.source_barcodes_surveys ( barcode );
CREATE INDEX idx_source_0 ON ag.source_barcodes_surveys ( survey_id );
CREATE INDEX idx_source_barcodes_surveys ON ag.source_barcodes_surveys ( source_id );

COMMENT ON COLUMN ag.source_barcodes_surveys.source_id IS 'Points to information about the source, like its name.';
COMMENT ON COLUMN ag.source_barcodes_surveys.barcode IS 'Points to barcode(s) that are assigned to this source.';
COMMENT ON COLUMN ag.source_barcodes_surveys.survey_id IS 'Points to survey(s) that are assigned to this source.';

ALTER TABLE ag.source_barcodes_surveys ADD CONSTRAINT fk_source_barcode FOREIGN KEY ( barcode ) REFERENCES barcodes.barcode( barcode );
ALTER TABLE ag.source_barcodes_surveys ADD CONSTRAINT fk_source_barcodes_surveys FOREIGN KEY ( source_id ) REFERENCES ag."source"( source_id );
wasade commented 7 years ago

Doesn't ag.ag_login_surveys already act as the "source" table?

On Sat, May 6, 2017 at 4:28 AM, Stefan Janssen notifications@github.com wrote:

The full SQL code to insert the two tables is:

CREATE TABLE ag."source" ( source_id uuid DEFAULT ag.uuid_generate_v4() , source_name varchar(100) , ag_login_id uuid , CONSTRAINT pk_source UNIQUE ( source_id ) );

CREATE INDEX idx_source_1 ON ag."source" ( ag_login_id );

COMMENT ON TABLE ag."source" IS 'A "source" is the human, animal or environment that was swapped for microbial analysis.'; COMMENT ON COLUMN ag."source".source_id IS 'A unique ID to identify the source.'; COMMENT ON COLUMN ag."source".source_name IS 'Sources can come with names, e.g. names for the humans that have been swapped.'; COMMENT ON COLUMN ag."source".ag_login_id IS 'Points to the "user" that owns this source.';

CREATE TABLE ag.source_barcodes_surveys ( source_id uuid NOT NULL, barcode varchar , survey_id varchar );

CREATE INDEX idx_source ON ag.source_barcodes_surveys ( barcode ); CREATE INDEX idx_source_0 ON ag.source_barcodes_surveys ( survey_id ); CREATE INDEX idx_source_barcodes_surveys ON ag.source_barcodes_surveys ( source_id );

COMMENT ON COLUMN ag.source_barcodes_surveys.source_id IS 'Points to information about the source, like its name.'; COMMENT ON COLUMN ag.source_barcodes_surveys.barcode IS 'Points to barcode(s) that are assigned to this source.'; COMMENT ON COLUMN ag.source_barcodes_surveys.survey_id IS 'Points to survey(s) that are assigned to this source.';

ALTER TABLE ag.source_barcodes_surveys ADD CONSTRAINT fk_source_barcode FOREIGN KEY ( barcode ) REFERENCES barcodes.barcode( barcode ); ALTER TABLE ag.source_barcodes_surveys ADD CONSTRAINT fk_source_barcodes_surveys FOREIGN KEY ( source_id ) REFERENCES ag."source"( source_id );

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biocore/american-gut-web/issues/671#issuecomment-299633583, or mute the thread https://github.com/notifications/unsubscribe-auth/AAc8soMAZ2u0nKuQtP_xEW2ha-bTLv9Oks5r3FlTgaJpZM4NSuck .

sjanssen2 commented 7 years ago

After thinking about it, I tend to agree with you about the identity of my "source" and the already existing ag.ag_login_surveys table. Am I right that also samples from animals need to have a main survey in our system?

wasade commented 7 years ago

There is an animal survey

On May 6, 2017 07:41, "Stefan Janssen" notifications@github.com wrote:

After thinking about it, I tend to agree with you about the identity of my "source" and the already existing ag.ag_login_surveys table. Am I right that also samples from animals need to have a main survey in our system?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biocore/american-gut-web/issues/671#issuecomment-299637309, or mute the thread https://github.com/notifications/unsubscribe-auth/AAc8sh9I4hn1DX63JRHhpkuD9c3EKhmNks5r3GpkgaJpZM4NSuck .

sjanssen2 commented 7 years ago

table ag_animal_survey was one of those completely disconnected to our codebases. I dropped it with my last PR.

wasade commented 7 years ago

Right, it's encoded in the survey structure similar to the human survey

On May 6, 2017 17:46, "Stefan Janssen" notifications@github.com wrote:

table ag_animal_survey was one of those completely disconnected to our codebases. I dropped it with my last PR.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biocore/american-gut-web/issues/671#issuecomment-299670529, or mute the thread https://github.com/notifications/unsubscribe-auth/AAc8spEN8JGNXQKSquOUvvtoeWAeOQg4ks5r3PhigaJpZM4NSuck .

sjanssen2 commented 7 years ago

Regarding the "source" table: Table "ag.ag_login_surveys" does not directly have an ID for a source. It would be the combination of ag_login_id and participant_name (where the later is a user string). Is it a good design to not have some primary key like "source_id" for the concept of a source?

wasade commented 7 years ago

It uses a natural primary key composed of the tuple of those fields, right? A primary key does not need to be a single field

On May 6, 2017 17:51, "Stefan Janssen" notifications@github.com wrote:

Regarding the "source" table: Table "ag.ag_login_surveys" does not directly have an ID for a source. It would be the combination of ag_login_id and participant_name (where the later is a user string). Is it a good design to not have some primary key like "source_id" for the concept of a source?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/biocore/american-gut-web/issues/671#issuecomment-299670709, or mute the thread https://github.com/notifications/unsubscribe-auth/AAc8ssHymotPuxK3GmOqf8Qxc9pwhx2Rks5r3PlcgaJpZM4NSuck .

sjanssen2 commented 7 years ago

ohm. I don't know the exact definition of primary key, but the DB schema tells me that "survey_id" in "ag.ag_login_surveys" is the primary key.

josenavas commented 7 years ago

@wasade the problem with using tuples as primary key is that if you need to reference it you have to duplicate the tuple.

@sjanssen2 you're right, in the dbschema only one columns is labeled as primary key, but if the other column has a "unique" constraint that is making a "virtual" tuple primary key.

I would suggest adding a different id using bigserial rather than UUID to be able to easily reference it.

sjanssen2 commented 7 years ago

and note that "survey_id" is the primary key, which is not in the tuple of "ag_login_id, participant_name"