Open jonc125 opened 5 years ago
One challenge with splitting the Entity
table up will be to keep existing URLs the same - if every entity gets a new primary key through a migration then this could complicate matters! We might need to specify the key manually in the migration, which does seem to be possible thankfully.
Also, EntityComparisonJsonView.get
needs to be able to get any kind of entity given just an id. Not sure yet how to resolve that one! Similarly in EntityDiffView.get
.
Or, do we need to abstract stuff that will be useful for FittingSpec
too into an AbstractEntity
model, and keep the single Entity
table just for models & protocols? Except we'll want to be able to do things like unix diff on FittingSpec
s too presumably, which is what EntityComparisonJsonView
and EntityDiffView
support.
So perhaps we just need to use multi-table inheritance rather than the current proxy inheritance? Do we need to store some fields only for specific sub-types? (Currently just is_fitting_spec
, the hack on protocols that are actually for fitting.) So maybe only FittingSpec
needs its own table, to support the Protocol
link? Although giving each its own table, even with no extra fields, may improve query speed when we know what we're looking for enough to make it worthwhile.
Multi-table inheritance could also save having to duplicate things like EntityFile
and AnalysisTask
, since they can still link to Entity
.
Tables of interest:
Table "public.entities_entity"
Column | Type | Collation | Nullable | Default
-----------------+--------------------------+-----------+----------+---------------------------------------------
id | integer | | not null | nextval('entities_entity_id_seq'::regclass)
entity_type | character varying(16) | | not null |
name | character varying(255) | | not null |
created_at | timestamp with time zone | | not null |
author_id | integer | | not null |
is_fitting_spec | boolean | | not null |
Table "public.experiments_experiment"
Column | Type | Collation | Nullable | Default
------------------+--------------------------+-----------+----------+----------------------------------------------------
id | integer | | not null | nextval('experiments_experiment_id_seq'::regclass)
created_at | timestamp with time zone | | not null |
author_id | integer | | not null |
model_id | integer | | not null |
protocol_id | integer | | not null |
model_version | character varying(50) | | not null |
protocol_version | character varying(50) | | not null |
Table "public.datasets_dataset"
Column | Type | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+----------------------------------------------------------
id | integer | | not null | nextval('datasets_experimentaldataset_id_seq'::regclass)
visibility | character varying(16) | | not null |
created_at | timestamp with time zone | | not null |
author_id | integer | | not null |
protocol_id | integer | | not null |
name | character varying(255) | | not null |
description | text | | not null |
So there isn't really commonality at the DB level beyond the already available mixin classes, and particularly Dataset
and Prediction
look very different (since one is uploaded, and one generated; one has no versions and one does).
The commonality for the 'collection of files' models is more at the Python method level. (And the interface they present to templates/JS code.) At least the following are shared between Dataset
and PredictionVersion
:
abs_path
, archive_path
, files
open_file
, is_visible_to_user
Only in Dataset
but could be defined (differently) by both:archive_name
Prediction
and PredictionVersion
don't inherit from VisibilityModelMixin
but derive this info from Model
& Protocol
, so have things like name
, visibility
, viewers
defined locally.
We should possibly make a common base for DatasetFile
and EntityFile
with a name change to emphasise they track uploaded files during [version] creation?
This issue extends/subsumes #130 to reflect updated thinking since the 2018 workshop. It's also related to #133 & #134.
As can be seen in this rough diagram we will have 6 primary kinds of 'thing' represented in the Web Lab, which follow 2 primary 'archetypes' for how they are stored.
Model
- versioned, backed by git repositoryProtocol
- versioned, backed by git repositoryFittingSpec
- versioned, backed by git repositoryProtocol
(not specific version thereof) representing the experimental scenario which can be used to fit modelsPrediction
- versioned, collection of files in COMBINE ArchiveModel
andProtocol
, from which the prediction is derivedDataset
- not (yet?) versioned, collection of files in COMBINE ArchiveProtocol
(not specific version thereof) representing the experimental scenario which produced this dataset.FittingResult
- versioned, collection of files in COMBINE ArchiveModel
,Protocol
,FittingSpec
andDataset
, and represents the result of fitting that model to that data under that spec + protocol.FittingSpec
andDataset
may change in the future, probably best to keep a link here too.)At present in the code,
Model
andProtocol
inherit from a common base classEntity
, andPrediction
is calledExperiment
. These features largely derive from the representation in WL1, when all three were subclasses of Entity and versioning was done by the DB with collections of files in disk, no git repos.The
Entity
base class may no longer make sense; with the exception ofrepocache
there's little use ofEntity.objects
in the code. So possibly we want to get rid of it while addingFittingSpec
, in preference of a more mixin-based approach? It depends on whether we can designrepocache
sensibly to cache all kinds of git repos if there isn't a common DB table for it to link to - we do have several crucialCachedEntityVersion.objects
uses, but these always then filter either models or protocols. We only select generic 'entities' when filling the cache AFAICS. Or have 3 sets of cache tables all sharing the same code, so we start talking aboutCachedModel
,CachedModelVersion
,CachedModelTag
, etc?Where we want to end up is for templates, Javascript & Python code supporting each archetype to be reused as much as possible, never just copied & renamed. In doing so we need to consider how things are split between Django apps.
Creating/editing a fitting spec should look basically the same as for models & protocols.