Closed milen-sage closed 1 year ago
Data model documentation: https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/2473623559/The+Data+Model+Schema
@andrewelamb schematic's example data model csv may be handy as well (it's in the schematic repo).
@GiaJordan @mialy-defelice Me and @milen-sage just had a meeting and we agreed on these chnages:
"<table>_id"
where table is the name of the component the foreign key exists in, and this will have the validation rule "matchAtLeastOne <table>.id"
Gianna and Mialy, could you comment on if doing no.1 would be acceptable, and if no.2 is true?
Milen is there anything I missed?
- When schematic creates the uuid's during manifest validation we want this column to be named "id" instead of "uuid" going forward.
This seems acceptable to me.
- When a a validation rule is created like: "matchAtLeastOne patient.id" the attribute with the rule will be matched with the uuid's in the "id" column of the patient component even if patient doesn't have an attribute named "id"
This part "even if patient doesn't have an attribute named "id"" is throwing me off. Can you elaborate more? I thought you're forcing all tables to have an id column...
- schematic_db will assume the "id" column exists in all manifests and this will serve as the primary key in the database and will also be called "id".
- schematic_db will assume that if a component depends on other components then it will have a foreign key to that component and will be named
"<table>_id"
where table is the name of the component the foreign key exists in, and this will have the validation rule"matchAtLeastOne <table>.id"
These parts make me nervous wrt the GFF project since they are making their own PKs and end up referencing them in other tables... will these changes be backwards compatible?
- When a a validation rule is created like: "matchAtLeastOne patient.id" the attribute with the rule will be matched with the uuid's in the "id" column of the patient component even if patient doesn't have an attribute named "id"
This part "even if patient doesn't have an attribute named "id"" is throwing me off. Can you elaborate more? I thought you're forcing all tables to have an id column...
Yep. Say we have two components, patients and samples where samples depends on patients. The patients component only has one attribute, "patient_name", so a column of uuids is generated by schematic with the column name "id". The samples component only has two attributes "sample_name" and "patient_id". What I want to know is if we give the "patient_id" attribute the validation rule "matchAtLeastOne patient.id" will schematic ensure that every value in the "patient_id" column will be matched to the "id" column in patients that schematic generated even if it's not in the schema.
- schematic_db will assume the "id" column exists in all manifests and this will serve as the primary key in the database and will also be called "id".
- schematic_db will assume that if a component depends on other components then it will have a foreign key to that component and will be named
"<table>_id"
where table is the name of the component the foreign key exists in, and this will have the validation rule"matchAtLeastOne <table>.id"
These parts make me nervous wrt the GFF project since they are making their own PKs and end up referencing them in other tables... will these changes be backwards compatible?
@milen-sage I think you should chime in on this. I think he mentioned that we are going to have them go through DCA in the future so full backwards compatibility might be possible anyway. But, in this system they could still make their own keys. They would just have to name them in the same way:
<table_name>_id
@mialy-defelice To make schematic_db more backwards compatible I can keep in some code I was going to remove that allowed users to create a function that would generate the names of primary keys based on the table. I was going to remove it, but I cna keep it in for now. It was designed as GFF keys as an example.
- When schematic creates the uuid's during manifest validation we want this column to be named "id" instead of "uuid" going forward.
Also fine with me
- When a a validation rule is created like: "matchAtLeastOne patient.id" the attribute with the rule will be matched with the uuid's in the "id" column of the patient component even if patient doesn't have an attribute named "id"
This is possible, but will require me to tweak parts of how the validation rule warnings are generated. It shouldn't be a big fix however. I may be able to roll it into the fix for #1074.
This could be an issue with existing manifests, as they would need to be re-uploaded or otherwise updated to change the Uuid
column to id
as the cross manifest rule looks at already uploaded manifests.
@andrewelamb
Yep. Say we have two components, patients and samples where samples depends on patients. The patients component only has one attribute, "patient_name", so a column of uuids is generated by schematic with the column name "id". The samples component only has two attributes "sample_name" and "patient_id". What I want to know is if we give the "patient_id" attribute the validation rule "matchAtLeastOne patient.id" will schematic ensure that every value in the "patient_id" column will be matched to the "id" column in patients that schematic generated even if it's not in the schema.
@andrewelamb I can update the error/warning generation so that this is supported. The values in the patient_id
column would have to be the same uuids as the id
column
@andrewelamb the way you describe how primary keys supplied by users can be handled is accurate. This approach seems backward compatible with the GFF project too, given the bigger changes in their workflow that would be required.
@GiaJordan point taken on scenarios where users might want to cross validate ids in existing manifest containing uuid. In the case of GFF, manifests will be reuploaded anyway. In the case of HTAN, curators wouldn't make use of this id cross validating functionality since their schema is not setup with RDB conventions.
Thanks, @andrewelamb , for bringing up this ticket to our attention. I feel like this is a good example ticket that would require a discussion in preparation for the refinement meeting in order to identify what's left to do, for whom, alignment with story map, etc.. I will defer to the dev team to decide whether to handle it via one-off vs recurring meeting. cc: @mialy-defelice , @GiaJordan , @milen-sage , @AmyHeiser
Move to JIRA: https://sagebionetworks.jira.com/browse/FDS-100
Reproduce a version of the iAtlas DB schema and DB instance using schematic DB.
AC: