Open CasperWA opened 2 years ago
Why is it needed? Isn't it a big change for the framework? Will it destroy compatibility with previous plugins?
We're trying to not destroy compatibility with previous plugins, but only for a while. Our intention is to move to a more minimalized session, and keep data in a semantic container, in the default case a DLite collection. However, it could be using any framework for this, internally, this depends mainly on the strategies and the overall service that installs this package and a select group of plugin packages.
See also the description of session_type
/session_id
in suggested in issue #177.
In addition to them, the download strategy needs a standard way to communicate how to retrieve the downloaded content, e.g. the key under which the content is stored in the data cache. Note that the download strategy has no idea about the meaning of the downloaded content, so it make no sense for it to try to use session_id
to store (the reference to) the content within the underlying interoperability platform.
Would introducing standard download_type
/download_key
fields in the session be sufficient? For example
download_type="datacache"
download_key="7e9a7074-a72b-4ccd-9580-7cfd7be516c0" # a hash of the downloaded content used as key in the datacache
To not bother all parse strategies with these details, OTEAPI could provide a utility function get_downloaded_content(session)
that returns the downloaded content. That would also make it much easier to change things later.
Note, if we have several download strategies after each other in one pipeline this wouldn't work. But that is not how the pipelines are supposed to be used. However, the following should actually work
pipe1 = download1 >> parse1 >> mapping1
pipe2 = download2 >> parse2 >> mapping2
pipe3 = pipe1 + pipe2 >> mapping3 >> transformation
pipe3.get()
since the get() method of parse1
will see the values of download_type
/download_key
assigned by download1
while parse2
will see the values of download_type
/download_key
assigned by download2
.
When splitting the dataresource datamodel into download (may be called dataresource) and parse, it was suggested that mediaType
should be in the download datamodel. But its value is needed by the parse strategy, so the download needs somehow to also communicate this to the parse strategy. Maybe yet a new standard download_mediaType
field in the session is required for this?
It may be generalised to download_configuration
if we expect that the parse strategy may utilise more fields from the download configuration.
In addition to session_type
and session_id
suggested in issue #177, we may also need a session_configuration
field of type dict. For instance, for a "dlite" session on a distributed system, we need to specify which storage to store the collection in for communication between strategies. Such information could go into session_configuration
.
An important question that has not been answered here is where the session_type
, session_id
and session_configuration
should be provided. These fields are unrelated to the data documentation and therefore doesn't belong to the configuration of the individual strategies.
This is the same question as addressed in issue #211.
In a recent discussion between @quaat, @jesper-friis and myself, we decided to move in a direction where the session object is not used to transfer data in any way, but rather reference semantic objects (like DLite collections or OSP-Core entities) in which the data is put/stored, which can then be referenced and invoked in the individual strategies as needed.
A first step to moving in this direction is to expand the
SessionUpdate
pydantic model with some minimum fields that may not be overwritten in sub-classes and are information complete with respect to retrieving the semantic object and understanding which framework to use (DLite, OSP-Core, etc.).