ModellingWebLab / WebLab

Django-based front-end for the modelling Web Lab v2
Other
3 stars 2 forks source link

Restructuring Django model + app hierarchy #203

Open jonc125 opened 5 years ago

jonc125 commented 5 years ago

This issue extends/subsumes #130 to reflect updated thinking since the 2018 workshop. It's also related to #133 & #134.

Rough diagram of proposed structure

As can be seen in this rough diagram we will have 6 primary kinds of 'thing' represented in the Web Lab, which follow 2 primary 'archetypes' for how they are stored.

  1. Model - versioned, backed by git repository
  2. Protocol - versioned, backed by git repository
  3. FittingSpec - versioned, backed by git repository
    1. Links to a Protocol (not specific version thereof) representing the experimental scenario which can be used to fit models
  4. Prediction - versioned, collection of files in COMBINE Archive
    1. Links to (specific versions of) a Model and Protocol, from which the prediction is derived
  5. Dataset - not (yet?) versioned, collection of files in COMBINE Archive
    1. Links to a Protocol (not specific version thereof) representing the experimental scenario which produced this dataset.
    2. At present this is fixed on dataset creation; we may wish to allow it to be updated later?
  6. FittingResult - versioned, collection of files in COMBINE Archive
    1. Links to (specific versions of) a Model, Protocol, FittingSpec and Dataset, and represents the result of fitting that model to that data under that spec + protocol.
    2. The particular protocol is implied by the choice of fitting spec & dataset, but the version of it is not so needs to be explicit. (And because FittingSpec and Dataset may change in the future, probably best to keep a link here too.)

At present in the code, Model and Protocol inherit from a common base class Entity, and Prediction is called Experiment. These features largely derive from the representation in WL1, when all three were subclasses of Entity and versioning was done by the DB with collections of files in disk, no git repos.

The Entity base class may no longer make sense; with the exception of repocache there's little use of Entity.objects in the code. So possibly we want to get rid of it while adding FittingSpec, in preference of a more mixin-based approach? It depends on whether we can design repocache sensibly to cache all kinds of git repos if there isn't a common DB table for it to link to - we do have several crucial CachedEntityVersion.objects uses, but these always then filter either models or protocols. We only select generic 'entities' when filling the cache AFAICS. Or have 3 sets of cache tables all sharing the same code, so we start talking about CachedModel, CachedModelVersion, CachedModelTag, etc?

Where we want to end up is for templates, Javascript & Python code supporting each archetype to be reused as much as possible, never just copied & renamed. In doing so we need to consider how things are split between Django apps.

Creating/editing a fitting spec should look basically the same as for models & protocols.

jonc125 commented 4 years ago

One challenge with splitting the Entity table up will be to keep existing URLs the same - if every entity gets a new primary key through a migration then this could complicate matters! We might need to specify the key manually in the migration, which does seem to be possible thankfully.

Also, EntityComparisonJsonView.get needs to be able to get any kind of entity given just an id. Not sure yet how to resolve that one! Similarly in EntityDiffView.get.

jonc125 commented 4 years ago

Or, do we need to abstract stuff that will be useful for FittingSpec too into an AbstractEntity model, and keep the single Entity table just for models & protocols? Except we'll want to be able to do things like unix diff on FittingSpecs too presumably, which is what EntityComparisonJsonView and EntityDiffView support.

So perhaps we just need to use multi-table inheritance rather than the current proxy inheritance? Do we need to store some fields only for specific sub-types? (Currently just is_fitting_spec, the hack on protocols that are actually for fitting.) So maybe only FittingSpec needs its own table, to support the Protocol link? Although giving each its own table, even with no extra fields, may improve query speed when we know what we're looking for enough to make it worthwhile.

Multi-table inheritance could also save having to duplicate things like EntityFile and AnalysisTask, since they can still link to Entity.

jonc125 commented 4 years ago

Tables of interest:

                                         Table "public.entities_entity"
     Column      |           Type           | Collation | Nullable |                   Default
-----------------+--------------------------+-----------+----------+---------------------------------------------
 id              | integer                  |           | not null | nextval('entities_entity_id_seq'::regclass)
 entity_type     | character varying(16)    |           | not null |
 name            | character varying(255)   |           | not null |
 created_at      | timestamp with time zone |           | not null |
 author_id       | integer                  |           | not null |
 is_fitting_spec | boolean                  |           | not null |

                                          Table "public.experiments_experiment"
      Column      |           Type           | Collation | Nullable |                      Default
------------------+--------------------------+-----------+----------+----------------------------------------------------
 id               | integer                  |           | not null | nextval('experiments_experiment_id_seq'::regclass)
 created_at       | timestamp with time zone |           | not null |
 author_id        | integer                  |           | not null |
 model_id         | integer                  |           | not null |
 protocol_id      | integer                  |           | not null |
 model_version    | character varying(50)    |           | not null |
 protocol_version | character varying(50)    |           | not null |

                                             Table "public.datasets_dataset"
   Column    |           Type           | Collation | Nullable |                         Default
-------------+--------------------------+-----------+----------+----------------------------------------------------------
 id          | integer                  |           | not null | nextval('datasets_experimentaldataset_id_seq'::regclass)
 visibility  | character varying(16)    |           | not null |
 created_at  | timestamp with time zone |           | not null |
 author_id   | integer                  |           | not null |
 protocol_id | integer                  |           | not null |
 name        | character varying(255)   |           | not null |
 description | text                     |           | not null |

So there isn't really commonality at the DB level beyond the already available mixin classes, and particularly Dataset and Prediction look very different (since one is uploaded, and one generated; one has no versions and one does).

The commonality for the 'collection of files' models is more at the Python method level. (And the interface they present to templates/JS code.) At least the following are shared between Dataset and PredictionVersion:

Prediction and PredictionVersion don't inherit from VisibilityModelMixin but derive this info from Model & Protocol, so have things like name, visibility, viewers defined locally.

We should possibly make a common base for DatasetFile and EntityFile with a name change to emphasise they track uploaded files during [version] creation?