debrief / pepys-import

Support library for Pepys maritime data analysis environment
https://pepys-import.readthedocs.io/
Apache License 2.0
5 stars 5 forks source link

pepys."Tasks" table - Self-Referential design #839

Closed rnllv closed 3 years ago

rnllv commented 3 years ago

pepys."Tasks" table has a self-referential design (popularly known as Adjacency list anti-pattern in SQL Designs). Self-referencing table design is considered as a design anti-pattern for the below reasons.

1) For tree like data structure represented by Adjacency list, the operations on the nodes cannot be done independent of the edges. In other words, two logically different functions (DML on Task definitions, and DML on Task relations) rely on the same table.

2) Join operations for tree querying has to process the Task definition each time.

3) Cascading deletes cannot be specified as they might induce a recursive sequence of deletes.

4) Depth of tree has to be hard coded into the query. This issue is invalidated by the SQL-99 standard, which defines recursive query syntax using the WITH keyword followed by a common table expression. And currently, Sqlite and Postgres have implemented this standard/feature.

One commonly cited design pattern to implement tree data is the Closure tables. Closure tables also allows for many-to-many relationships. Should we consider redesigning pepys."Tasks" using this?

If the recursion depth is going to be static, i.e. only 3 levels of depth is allowed, then can we change our table design to reflect this instead of going for a recursive structure? This will make sure that only one task id column is present and the other two levels are just their corresponding level 2 and 3 attributes.

rnllv commented 3 years ago

@IanMayo, with reference to this comment, what serial name should the dashboard display if there are multiple serials for the previous day?

The Serial (J06230) will always be displayed - since they are unique for this wargame. But, analysts will find value in the exercise label too (CASEX 324) - so we'll display both.

What will happen if the query finds two Serials (say J06230 and J06240) happened on the day before?

IanMayo commented 3 years ago

What will happen if the query finds two Serials (say J06230 and J06240) happened on the day before?

The 06 indicates it's a serial on the 6th day of the month. Serials the previous day would be J05xxx.

robintw commented 3 years ago

@IanMayo I'll go ahead and do these changes in the SQLAlchemy table definitions (I've started creating those, so they can run in parallel with this discussion, and I can get on with updating the UI to use the new tables).

For the new ForceTypes table, should we populate it with the Blue and Red entries when we do the populate reference data action?

IanMayo commented 3 years ago

Populate Reference data I wish I'd thought of that :-)

Great idea.

robintw commented 3 years ago

@IanMayo I've just removed the old Tasks and Participants tables. In the Geometries table we currently have a foreign key to Tasks.task_id. What do you want that replaced with? A reference to Series, Wargames or Serials? I'm guessing Serials but wanted to check.

IanMayo commented 3 years ago

geometries - task_id Well, that's a blast from the past. I guess the intention was to allow the bounding rectangle for an exercise to be represented.

Yes, let's allow a reference to a serial_id

robintw commented 3 years ago

Great thanks.

A couple of things to check @IanMayo:

  1. Are start and end required fields for both Wargame and Serial?
  2. Are start and end optional for both SerialParticipant and WargameParticipant, or are they required for a Wargame?
  3. Is the Serial.exercise field optional?
IanMayo commented 3 years ago

Hi,

  1. Hmm, I don't think they are essential for our logic, but let's say "Yes" - just to make it most likely that we have the data we need.
  2. They are optional for participants. When missing, we know it's the extent of the wargame/serial.
  3. Yes - it is not always present.
robintw commented 3 years ago

Thanks.

Ah, just realised I asked the question partly incorrectly. Currently we don't have a start and end field for WargameParticipants but we do for SerialParticipants. That is, we just add participants to the Wargame with no information on when they are taking part, and that information on when comes as part of the SerialParticipant. Is that correct?

IanMayo commented 3 years ago

Hmm, at this point in time it's not important. We could help the user be restricting serial participation times to the overall war-game participation time. But, that's not an important capability.

Let's skip start/end date for war-game participation until we have a stronger use case.

robintw commented 3 years ago

Excellent, that was the answer I was hoping you'd give :)

@IanMayo @arunlal-v: Are we fairly sure that we've finalised the database structure for the new tables? If so, I'll go ahead and create the database migration for the new structure. The GUI is progressing, but for Ian to test it I'll need to have the migration sorted.

(@IanMayo: I think I've got most of the existing Tasks GUI working with the new tables, and I've added validation. I expect you'll be able to find some bugs, but don't try to test it yet until I've dealt with the database migration. I'm on childcare duty now, but will aim to do a bit more over the weekend (if I have time), otherwise I'll continue on Monday morning).

IanMayo commented 3 years ago

That's fine - thanks @robintw .

I'm not aware of any table changes - other than the Force one that you're aware of.

rnllv commented 3 years ago

We will have one coverage diagram per serial.

This answers my question What will happen if the query finds two Serials (say J06230 and J06240) happened on the day before?. I was under the impression that the dashboard will have only one diagram for all serials. Thanks @IanMayo.

IanMayo commented 3 years ago

Mockup: Wallboard 3

rnllv commented 3 years ago

Thanks a lot @IanMayo. This makes it much clearer.

I don't see a need for the second query being called multiple times. I mean that's a call we can take later based on performance. Right now, the second query can handle all serials at once, based on the input json supplied to it.

IanMayo commented 3 years ago

If Each SuperSet component makes its own connection to the database, then the app will end up calling the query multiple times - either once per Serial or even once per platform.

Note: I'm currently considering getting a custom (temporary) Python visualisation setup, just to get something working. Once this wargame is over - we can learn lessons to pass onto SuperSet.

For this hand-made Python visualisation it's quite likely that we can just run the query once - for all the data, then slice it up before being rendered.

rnllv commented 3 years ago

Sure @IanMayo.

@robintw

These 2 items are pending on the schema front. Working on them right now.

1) Based on the above, I think we need @arunlal-v to modify the table design:

Rename exercises to wargames
Rename serial.name to serial.serial_number and introduce an optional exercise text field.

2) ah, @arunlal-v - I realise we need another field for SerialParticipants. The participant will be playing as either Blue Force (friendly) or Red Force (enemy). One day we may need to support White Force (non-combatant).

I think this could be met through a Force Types table, containing:

ForceID
Name (Blue or Red to start with)
Color (#00F or #F00 to start with)
This will allow a compulsory Force field in the SerialParticipants table.
rnllv commented 3 years ago

@robintw, below is the updated script incorporating the comments in this thread.

--drop objects

drop table if exists pepys."SerialParticipants";
drop table if exists pepys."Forces";
drop table if exists pepys."WarGameParticipants";
drop table if exists pepys."Serials";
drop table if exists pepys."WarGames";
drop table if exists pepys."Series";

--Create "Series"
create table pepys."Series"(
    series_id uuid constraint serpk primary key,
    name text,
    privacy_id uuid constraint serprvref references pepys."Privacies"(privacy_id)
);

--Create "WarGames"
create table pepys."WarGames"(
    wargame_id uuid constraint wrgmpk primary key,
    name text,
    series_id uuid constraint seridref references pepys."Series"(series_id),
    start timestamp without time zone,
    "end" timestamp without time zone,
    privacy_id uuid constraint exerprvref references pepys."Privacies"(privacy_id)
);

--Create "Serials"
create table pepys."Serials"(
    serial_id uuid constraint gampk primary key,
    wargame_id uuid constraint serwgidref references pepys."WarGames"(wargame_id),
    serial_number text,
    exercise text,
    start timestamp without time zone,
    "end" timestamp without time zone,
    environment text,
    location text,
    privacy_id uuid constraint serprivref references pepys."Privacies"(privacy_id)
);

--Create "WarGameParticipants"
create table pepys."WarGameParticipants"(
    wargame_participant_id uuid constraint partpk primary key,
    wargame_id uuid constraint exprtwgidref references pepys."WarGames"(wargame_id),
    platform_id uuid constraint pltref references pepys."Platforms"(platform_id),
    privacy_id uuid constraint exprtprvref references pepys."Privacies"(privacy_id)
);

--Create "Forces"
drop table if exists pepys."Forces";
create table pepys."Forces"(
    force_id uuid constraint foridpk primary key,
    name text constraint namenotnull not null,
    color text
);

--Create "SerialParticipants"
create table pepys."SerialParticipants"(
    serial_participant_id uuid constraint spartid primary key,
    wargame_participant_id uuid constraint partref references pepys."WarGameParticipants"(wargame_participant_id),
    serial_id uuid constraint serref references pepys."Serials"(serial_id),
    start timestamp without time zone,
    "end" timestamp without time zone,
    force uuid constraint foridrf references pepys."Forces"(force_id),
    privacy_id uuid constraint serprtprvref references pepys."Privacies"(privacy_id)
);

alter table pepys."PlatformTypes" add column default_data_interval_secs integer;

--populate data

update pepys."PlatformTypes" set default_data_interval_secs = 30;

insert into pepys."Series" values ('1e0187ca-9146-0794-62fb-c6f1c69c3dee', 'Suez Warriors', '225f3795-56c7-43c0-9846-ae6a3a6ce0fe');

insert into pepys."WarGames" values ('7d5a71e7-40ad-06e0-99f9-513b3216ed73', 'SW 20/01', '1e0187ca-9146-0794-62fb-c6f1c69c3dee', '2020-11-15 00:00:00.000000','2020-11-15 23:59:59.000000', '225f3795-56c7-43c0-9846-ae6a3a6ce0fe');

insert into pepys."Serials" values ('faa18ef7-6823-33dc-0f16-de6e4b2c02f3','7d5a71e7-40ad-06e0-99f9-513b3216ed73','SW05110', 'Exercise', '2020-11-15 00:00:00.000000','2020-11-15 23:59:59.000000', 'Environment', 'Location', '225f3795-56c7-43c0-9846-ae6a3a6ce0fe');

insert into pepys."WarGameParticipants" values ('f16e405b-5f3a-f4e2-50e2-e5a5dabdf142','7d5a71e7-40ad-06e0-99f9-513b3216ed73','50d64387-9f91-4c6f-8933-f4e7b1a7d8ab','225f3795-56c7-43c0-9846-ae6a3a6ce0fe');

insert into pepys."Forces" values ('225f3795-56c7-43c0-9846-ae6a3a6ce0ff','Blue','#F00');

insert into pepys."SerialParticipants" values ('cae20db7-ab15-8ba1-eeaa-fc7cbdc172d7','f16e405b-5f3a-f4e2-50e2-e5a5dabdf142','faa18ef7-6823-33dc-0f16-de6e4b2c02f3', '2020-11-15 00:00:00.000000','2020-11-15 23:59:59.000000', '225f3795-56c7-43c0-9846-ae6a3a6ce0ff', '225f3795-56c7-43c0-9846-ae6a3a6ce0fe');

commit;
rnllv commented 3 years ago

@robintw, just FYI. The constraint names given for the new tables do not follow the same syntax or pattern as the existing ones.

robintw commented 3 years ago

That's fine @arunlal-v - I've been translating everything into the relevant SQLAlchemy table definitions, so the database migration will pick up our standard of constraint naming from our configuration. I'll get the migration done on Monday.

robintw commented 3 years ago

Hi @arunlal-v @IanMayo.

I've now added the database migration scripts in my PR at #833. @IanMayo, if you want to test the new GUI then you should be able to checkout that branch and run it through Pepys Admin. You'll need to run a database migration first.

I see that @arunlal-v has already added these tables to the TracStor database. Are you ok if we delete these, and then run the migration on that database? There will be some subtle differences between the SQL you ran to create these tables, and the SQL that the migration will run (eg. constraint names, string field lengths etc) and the migration code is likely to get confused if it doesn't have everything exactly how it expects to see it. We can then add the test data again if necessary.

Also, my SQLAlchemy scripts and migrations were already set up to call the table Wargames rather than WarGames (as I believe it is usually treated as a single word). I'd have to change a lot of stuff to make it WarGames(and I tried to do a simple find and replace and it caused more problems than it solved), so can we keep it as Wargames? That may require some changes in your queries that look for Serials in the previous day and then get the coverage.

IanMayo commented 3 years ago

delete tables Yes - that sounds the right thing to do.

table name Yes - let's stick with Wargames

robintw commented 3 years ago

Ok, I'll wait for confirmation from @arunlal-v that he's ok for me to delete the tables, then I'll go ahead and delete them (or he can delete them if he prefers - as he knows exactly what he created)

rnllv commented 3 years ago

Hey @robintw, sure. Please go ahead.

This is my summary.

Tables created:

pepys."SerialParticipants"; pepys."Forces"; pepys."WarGameParticipants"; pepys."Serials"; pepys."WarGames"; pepys."Series";

Tables altered:

pepys."PlatformTypes";

robintw commented 3 years ago

Thanks @arunlal-v.

I've deleted the tables and run the migration on the TracStor database. Everything should be working now - I've added a few example Series/Wargames/Serials, although no participants yet (as I haven't got the GUI for that bit done yet...that's tomorrow's job).

As well as the replacement of WarGames with Wargames, I think there may also be a difference with the Forces table - in my version it is ForceTypes (to match up with all the other XXXTypes tables that we have).

Let me know if you have any difficulties.

rnllv commented 3 years ago

Sure @robintw.

I was checking with @IanMayo the other day on pepys."ForceTypes" table. Shouldn't it be pepys."Forces"? That should be in line with pepys."Platforms" and pepys."PlatformTypes" tables.

robintw commented 3 years ago

Umm, I'll let @IanMayo confirm what he wants: but in my view it counts as one of the group of tables that we refer to as a 'Reference Table' and a lot of those are XXXType.

(It would be a bit of a pain to change this now, but definitely far less of a pain to change it now than it would be to change it later - so if we're going to change it, we should change it now)

Also, @IanMayo: How should we go about pre-populating the ForceTypes table with Red/Blue entries? We can load these from CSVs, but we may run into the same problem as we discussed in relation to adding default_data_interval_secs: the users might not want to import from CSVs again as they might import a load of other stuff they don't want (although there is a command-line option to import from a specific folder, so they could use that - but that requires them running Pepys command lines).

Update: Another alternative is they just do it through the Maintenance GUI manually - it shouldn't be hard.

IanMayo commented 3 years ago

There are pros for both options:

  1. The "Reference" tables tend to be Types of data
  2. This is a list of forces, not a list of types of force.

bit of a pain to change this now.

Let's totally stick with ForceTypes then :-)

robintw commented 3 years ago

These tables are now all implemented in SQLAlchemy and used in Pepys.