elixir-europe / human-data-maturity-model

Code for the ELIXIR Human Data Maturity Model website.
https://elixir-europe.github.io/human-data-maturity-model/
0 stars 2 forks source link

Core: Storage and Interfaces: Technical: APIs #2

Closed MKonopkoELIXIR closed 2 years ago

MKonopkoELIXIR commented 2 years ago

When "interfaces" is referred to in Storage and Interfaces, is that submission and retrieval? @GiselleMarie

GiselleMarie commented 2 years ago

In your mind does retrieval also include access?

MKonopkoELIXIR commented 2 years ago

Not access in the "data access" sense, @GiselleMarie

This could also be the wrong word. https://www.techopedia.com/definition/30140/data-retrieval Maybe Data Extraction is better? https://www.techopedia.com/definition/25328/data-extraction

MKonopkoELIXIR commented 2 years ago

Emailed Dylan Spalding

MKonopkoELIXIR commented 2 years ago

Per @malloryfreeberg

Yes, the main category of interfaces are user <-> resource (e.g. to support data submission, data discovery, and data access and/or retrieval). For FEGA, we also have the notion of interfaces between nodes, or resource <-> resource. This is to exchange non-personal metadata to support queries across the FEGA network. Depending on how the resource is established, there also might be within resource interfaces between institutions in the same node (e.g. between the institutions that make up to German Human Genome-phenome Archive, GHGA). I'm not sure how important it is to capture these between/within resource interfaces in your model.

Need to determine whether these FEGA-related needs align 1:1 with ELIXIR Node needs. Also, if the needs are varied, a single indicator may be inappropriate.

MKonopkoELIXIR commented 2 years ago

Per Peter Maccallum: Distinction between read only interface and data interaction/upload interfaces Read only interface: query Admin read/write interface: submission

Suggests splitting this into two as above.

MKonopkoELIXIR commented 2 years ago

Started to make the change and realised there are three options.

  1. Under Storage and Interfaces, make two indicators: read-only and read/write. Remove all API related indicators from other sections.
  2. Allow all relevant sections to have their own API indicators, such as data discovery, data reception, data access, etc. If that is the case, what are the relevant interfaces to keep here?
  3. I assume that the interfaces aspect of "storage and interfaces" section is meant to deal with all interfaces for the data once it is in storage. If this is the case, maybe the data reception (aka write) APIs are covered in data reception and the read only APIs are covered here?

Sent a message to Peter and sent the following email to Dylan (CC Tommi):

Hi Dylan,

I’ve made a bit of progress here by speaking to a variety of people, but I have again hit a bit of a wall and everyone says you’re the guy with the knowledge.

Peter Maccallum did explain what “interfaces” would cover as a concept and pointed out that I need to recognise the difference between read-only and read/write interfaces. My questions for you are thus:

  1. Does “Storage and Interfaces” cover all APIs?
  2. If not, do all of the core functionalities have their own APIs fall under their own functionalities (i.e. data reception APIs, data discovery APIs, data access APIs)? If they all do fall under their own functionalities, what APIs/interfaces are specifically tied to storage?
  3. Or is it that this section deals only with interfaces for the data once it is in storage, so data reception (aka read/write APIs) go with data reception, but discovery and access APIs (aka read-only) fit under storage and interfaces?

I’m copying in Tommi since he wrote the scoping paper that this all hinges on. I know he’s away for a bit, bug guidance whenever someone can point me in the right direction would be very helpful.

MKonopkoELIXIR commented 2 years ago

Email from Dylan:

This slide: https://docs.google.com/presentation/d/1rlwu5wRjZqkvkEGAxEDTjT3mvV-0uzRHCiun1qOkeQo/edit may help.

At a high level the storage and interfaces deals with the 'FEGA' like functionality - i.e putting the files onto secure storage, tracking their location, versioning them, deleting them if required, maintaining the metadata about the files (how they were produced, which individual(s) information is included in the files, data access requirements for the files etc. In effect this functionality underpins all other functionalities - e.g. data discovery uses (for example Beacon or the data portal) to determine which data is stored for what data use, REMS provides data access and management tools, and data reception can be both the movement of data if required (e.g htsget) and the curation processes (checking data conforms to the data model for example) both to (and in terms of htsget, from) the 'FEGA' like instance providing the storage and interfaces.

So in answer to your questions,: a) no, b) yes, storage and interfaces have many internal API's, but also PUT a file or metadata about a file, UPDATE a file or metadata about a file, GET a file or metadata about a file, DELETE a file and associated metadata about a file. c) The other functionalities have APIs that sit on top of these to provide additional functionality. For example in the case of data access and management, storage and interfaces know the file, access restrictions, etc. but not who has access to the file, this is held by the REMS instance(s). In the case of Beacon, Beacon can give some metadata about the file itself, but (possibly) the phenotypic data may be held within the Beacon instance in a format for querying, while the file and associated metadata (including phenotypic data if necessary) is held via the storage and interfaces functionality.

Need a slightly better understanding of this before I make any changes. Reached out to Peter for assistance.

MKonopkoELIXIR commented 2 years ago

Had a conversation with Dylan and he outlined the topics for the APIs as follows (see below for required changes):

MKonopkoELIXIR commented 2 years ago

Made updates per above.