Open satra opened 7 months ago
I just filed #205 at the same time as this. Which do you think is better?
@jwodder - i tried to highlight use-cases here: type vs requirement validation, asynchronicity of information filling a model, and missingness of information for various reasons (including asynchronicity). perhaps both are complementary.
if we were to make every attribute Optional
, added a property/function that checks requirements against a versioned state, and added enumerated qualifiers for missingness, would that help even compress the classes? for example, could the publishable mixins already be added? i.e do we need separate classes (typed checks) or properties/functions is_published
, is_valid_for
, etc ?
type vs requirement validation
Type validation is part of requirement validation.
asynchronicity of information filling a model, and missingness of information for various reasons
I think both of those can be addressed without an overhaul to what the models accept.
do we need separate classes (typed checks) or properties/functions
is_published
,is_valid_for
, etc ?
I would say that separate classes are a good idea, as it would allow consumers of the library to construct a "published metadata" instance once, after which the type system (such as it is in Python) would ensure that any further code that needed valid published data was receiving it. In contrast, if we just had is_published()
methods, any function that received a metadata instance that needed it to be publish-valid would have to call the method even if the calling code had previously called it. Compare "Parse, don't validate".
Type validation is part of requirement validation.
yes, but not the other way around. requirement
is a semantic construct dandi imposes on the structure. we require this field for the good of neuroscience when one creates a dandiset. type validation
ensures that the variable has the right type so computers can do the right thing with it, whether it's critical for neuroscience or not. and the requirements for good of neuroscience can change between now and later (higher frequency). types can also change but typically at a lower frequency.
publish-valid would have to call the method even if the calling code had previously called it.
i agree this makes it messy. we could write one time properties, deal with dirty states, etc., to address the mess. however, i don't fully see how the end user/scripter gets around this through classes. they still have to do if isinstance(asset, 'SomeAssetStateClass')
to know what state the Asset
is in when they do a GET
on the archive. isn't that the issue that the requirements are different at different stages of the evolution of the Asset
?
would say that separate classes are a good idea,
Currently we have BareAsset
, Asset
, PublishedAsset
in dandischema, but from a retrieval from server perspective it's always an Asset
currently even after publishing. if each stage was a different Asset Class, would one expect the API to return AssetInvalid
or AssetWithoutChecksum
or PostedAsset
between Asset
POST
and checksum or other task that injects info into the Asset record finishes? would this then change to PublishableAsset
after it's made valid, and PublishedAsset
after it's published? and what if different tasks (checksum, NWB re-extract, other scientific computation tasks, etc.,.) enter info at different stages into the asset record.
the question is whether we categorize state/stage as classes or properties or functions. states of assets will not go away as asset modification is an inherent feature of the system till the asset is published.
there has been a lot of discussion in relation to how to accept and serve valid models. let's use this issue to discuss details:
None
as a catch all. and different systems encode missing versus explicit use ofNone
in different ways. we should become consistent and i suggest we come up with a plan that distinguishesNone
fromMISSING
and further subcategorizeMISSING
with more specific elements from some ontology. this will also allow for valid dandisets where we impose new requirements, but someone could say the information wasMISSING
,NOT ACQUIRED
,RESTRICTED
. allowing us to encode sensitive fields as well.@jwodder @yarikoptic @AlmightyYakob @CodyCBakerPhD @candleindark
related issues: #127 #182