datacharmer / test_db

A sample MySQL database with an integrated test suite, used to test your applications and database servers
4.06k stars 2.59k forks source link

primary keys for dept_manager and dept_emp are too strict #20

Open jdduncan opened 4 years ago

jdduncan commented 4 years ago

In theory, Joe could work in Marketing from 1995 to 1998, and then in Sales from 1998 until 2000, and then in Marketing again. This may not actually occur in the sample data set. But it cannot be modeled in dept_emp if the primary key is (emp_no,dept_no).

If Joe is the department manager, this problem also applies to the dept_manager table.

A looser constraint is that, on any given date, a department has only one manager, and an employee has only one job. To model this, you would need a "temporal range" as a data type that can be indexed, or at least a SQL CHECK that date ranges do not overlap.

But using (emp_no, from_date) as primary key might be a better way to express the constraint. Maybe it would also allow the primary ordered index to be used in evaluating the views.

datacharmer commented 4 years ago

Hi JDD, Thanks for submitting this issue. I agree that the relationship between employees and departments should be better defined with the same granularity that we have for titles and salaries. However, I have two reservations against changing the current table the way you suggest:

  1. this is fabricated data, and the relationships reflect the relations that were in the original data. If it were an educational dataset, we should probably make the relationship more oriented to reality, where departments are also split by team and some other grouping. Those groups should be also probably related to departments in addition of people (e.g. team X moved from dept A to dept B).
  2. This database has been in use for 12 years with minimal changes. One popular workflow for it is to download the database, run a set of pre-defined queries, and compare results with known ones. If we changed the relationship between employees and departments in such a drastic way, we would probably break a lot of code.

I have been thinking of cleaning up some of the code that is using old features and making it more readable. The improvement that you suggest would be a first step for a version 2.0 of this database, where we break compatibility but provide a better schema.

jdduncan commented 4 years ago

Hi Giuseppe,

This makes a lot of sense. I also have some more practical changes related to using the data with NDB, which I will show you next week.

JD

On Feb 1, 2020, at 4:20 AM, Giuseppe Maxia notifications@github.com wrote:

Hi JDD, Thanks for submitting this issue. I agree that the relationship between employees and departments should be better defined with the same granularity that we have for titles and salaries. However, I have two reservations against changing the current table the way you suggest:

• this is fabricated data, and the relationships reflect the relations that were in the original data. If it were an educational dataset, we should probably make the relationship more oriented to reality, where departments are also split by team and some other grouping. Those groups should be also probably related to departments in addition of people (e.g. team X moved from dept A to dept B). • This database has been in use for 12 years with minimal changes. One popular workflow for it is to download the database, run a set of pre-defined queries, and compare results with known ones. If we changed the relationship between employees and departments in such a drastic way, we would probably break a lot of code. I have been thinking of cleaning up some of the code that is using old features and making it more readable. The improvement that you suggest would be a first step for a version 2.0 of this database, where we break compatibility but provide a better schema.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

0416yty commented 7 months ago

wow