edurange / edurange-server

cloud-based interactive security exercises
https://www.edurange.org
MIT License
14 stars 14 forks source link

Lack of foreign key constraints cause data integrity issue. #47

Closed newjam closed 5 years ago

newjam commented 5 years ago

The statistics relation has column scenario_id that logically references the scenario.id column, but there is no foreign key constraint. This means you can delete a scenario, and the the statistics table will have a stale reference to a nonexistent scenario. This has been fine until now, when we tried to migrate the database. The sequence for scenario.id started in the range of scenario_id columns in the statistics relation.

Adding

foreign key scenario_id references scenario (id) on delete set null

should solve this issue. Additionally, we need to write a migration script that updates the scenario_id column of the statistics table to null if the scenario does not exist.

Furthermore we should scrub the database schema and make sure we have foreign key relations defined to prevent future data integrity issues.

newjam commented 5 years ago

Suggested foreign keys and the number of missing references currently on the production database

           source            |      target       | missing
-----------------------------+-------------------+---------
 instance_groups.group_id    | groups.id         |       0
 instance_roles.role_id      | roles.id          |       0
 clouds.scenario_id          | scenarios.id      |      15
 groups.scenario_id          | scenarios.id      |      49
 players.user_id             | users.id          |      29
 subnets.cloud_id            | clouds.id         |       0
 instances.subnet_id         | subnets.id        |       0
 players.student_group_id    | student_groups.id |       8
 role_recipes.recipe_id      | recipes.id        |       0
 role_recipes.role_id        | roles.id          |       0
 questions.scenario_id       | scenarios.id      |     135
 instance_groups.instance_id | instances.id      |       0
 recipes.scenario_id         | scenarios.id      |     120
 instance_roles.instance_id  | instances.id      |       0
 players.group_id            | groups.id         |       0

This table was generated by

sudo -u postgres psql -d edurange_production -c "$(cat orphans.sql)"

where ophans.sql is

select 'clouds.scenario_id' as source, 'scenarios.id' as target, count(1) as missing from clouds where scenario_id not in (select id from scenarios)
UNION
select 'instances.subnet_id' as source, 'subnets.id' as target, count(1) as missing from instances where subnet_id not in (select id from subnets)
UNION
select 'subnets.cloud_id' as source, 'clouds.id' as target, count(1) as missing from subnets where cloud_id not in (select id from clouds)
UNION
select 'groups.scenario_id' as source, 'scenarios.id' as target, count(1) as missing from groups where scenario_id not in (select id from scenarios)
UNION
select 'instance_groups.instance_id' as source, 'instances.id' as target, count(1) as missing from instance_groups where instance_id not in (select id from instances)
UNION
select 'instance_groups.group_id' as source, 'groups.id' as target, count(1) as missing from instance_groups where group_id not in (select id from groups)
UNION
select 'instance_roles.role_id' as source, 'roles.id' as target, count(1) as missing from instance_roles where role_id not in (select id from roles)
UNION
select 'instance_roles.instance_id' as source, 'instances.id' as target, count(1) as missing from instance_roles where instance_id not in (select id from instances)
UNION
select 'players.group_id' as source, 'groups.id' as target, count(1) as missing from players where group_id not in (select id from groups)
UNION
select 'players.user_id' as source, 'users.id' as target, count(1) as missing from players where user_id not in (select id from users)
UNION
select 'players.student_group_id' as source, 'student_groups.id' as target, count(1) as missing from players where student_group_id not in (select id from student_groups)
UNION
select 'questions.scenario_id' as source, 'scenarios.id' as target, count(1) as missing from questions where scenario_id not in (select id from scenarios)
UNION
select 'recipes.scenario_id' as source, 'scenarios.id' as target, count(1) as missing from recipes where scenario_id not in (select id from scenarios)
UNION
select 'role_recipes.role_id' as source, 'roles.id' as target, count(1) as missing from role_recipes where role_id not in (select id from roles)
UNION
select 'role_recipes.recipe_id' as source, 'recipes.id' as target, count(1) as missing from role_recipes where recipe_id not in (select id from recipes)

which is the output of the following python script

#!/usr/bin/env python3

references = [
  ('clouds', 'scenario_id', 'scenarios', 'id'),
  ('instances', 'subnet_id', 'subnets', 'id'),
  ('subnets', 'cloud_id', 'clouds', 'id'),
  ('groups', 'scenario_id', 'scenarios', 'id'),
  ('instance_groups', 'instance_id', 'instances', 'id'),
  ('instance_groups', 'group_id', 'groups', 'id'),
  ('instance_roles', 'role_id', 'roles', 'id'),
  ('instance_roles', 'instance_id', 'instances', 'id'),
  ('players', 'group_id', 'groups', 'id'),
  ('players', 'user_id', 'users', 'id'),
  ('players', 'student_group_id', 'student_groups', 'id'),
  ('questions', 'scenario_id', 'scenarios', 'id'),
  ('recipes', 'scenario_id', 'scenarios', 'id'),
  ('role_recipes', 'role_id', 'roles', 'id'),
  ('role_recipes', 'recipe_id', 'recipes', 'id')
]

QUERY_TEMPLATE = "select '{t1}.{c1}' as source, '{t2}.{c2}' as target, count(1) as missing from {t1} where {c1} not in (select {c2} from {t2})"

print('\nUNION\n'.join([QUERY_TEMPLATE.format(t1 = t1, c1 = c1, t2 = t2, c2 = c2) for t1, c1, t2, c2 in references]))