Beskrivelse kopiert fra altinn-studio issue. Se kommentarer og diskusjoner på det issuet.
Sometimes the same file attachments are relevant across multiple instances and users. For now the attachment is closely bonded to an instance, thus leading to a need to upload identical files for each time the attachment is relevant.
The service owner should upload a file that is possible to reference across multiple instances and users. A specific use case:
DiBK uploads a nabovarsel (multiple PDFs - potentially large files)
They create instances for the recipients of the nabovarsel
They reference the nabovarsel files to the different instances (without having to upload identical files multiple times)
Shared file should not be deleted during cleanup
In scope
Where do we store the files & how to structure
Where do we store the metadata about the files
What would separating this logic into a new platform component look like?
What process is required for an app owner to store the file? (authorization)
Which model modifications are required?
What is the process for linking a shared file to an instance?
What is the process for unlinking a shared file from an instance?
How to retrieve a shared file as an end user? (+authoriazation)
How to ensure that shared files are not deleted during cleanup
How to handle shared files in localtest.
How can we ensure that a file is not referenced by any instances? Should it be possible for an application owner to delete these files and should we be responsible for ensuring that this doesn't affect any active/archived instances?
Considerations
[ ] Is it okay for app owners that we don't have any authorization on reading the data. I.e. anyone with a link can access the file
Authorization and limiting access to the resources is desired.
[ ] How will a solution look if we use a seperate platform component for storage?
Out of scope
What's out of scope for this analysis?
Constraints
Constraints or requirements (technical or functional) that affect this analysis.
Analysis
Where to store the files
Alternative A
Within the application owner's storage account [org]altinn[env]strg01 and container used for appdata [org]-[env]-appsdata-blob-db a new section is created for shared files (fileShare or maybe there's a more suitable name without other connotations). Putting in a new section before adding folders for the categories to make it easier to tell the instanceData appart from the shared data. Also if a category matches an appId we would have an issue.
Alternative B
A new storage account is created for each application owner solely dedicated for shared files.
We need to have some metadata about the blob as well such as
Id / dataGuid
fileName
contentType
created
lastChanged
lastChangedBy (how do app owners feel about tracking the last changed)
blobStoragePath or category
(- Could include a direct link only available for the app owner to access the element)
this represented as the fileInfo.json object.
Where to store metadata about the files
Possible options here:
blob storage in the same folder as the blob itself
a new collection in CosmosDB partitioned on applicationOwner
table in postgresql
Storing in storage account
(+) can ensure that the data exist when linking it to an instance as we are already in the container to retrieve metadata.
(+) could experiment with blob index tags
(-) cannot easily query the blobs based on metadata
(-) blob index tags feature is in preview and not available in Norway yet
Storing in Cosmos
(+) possible to query files in the file share
(-) will require another collection
(-) we should probably verify that the blob exists before connecting the metadata to an instance. Would require an additional operation.
PostgreSQL
If we set up a new platform component for the fileshare we wouldn't have any previous bindings to storage affecting our decision, and
the practicality of using PostgreSQL should be considered.
What would separating this logic into a new platform component look like?
Wrt. to performance and maintainability, introducing a new platform component rather than using Platform Storage wouldn't have any large effect, and the end-user will not know the difference.
A new platform component is introduced Platform Data / Platform Fileshare / Platform [insert descriptive component name].
The purpose of this component would be to expose endpoints for storing and managing data not directly related to an instance (i.e. not form data or attachments for a single instance).
The platform component would require a link to authentication (well known endpoint + redirect for missing auth) and authorization (PDP).
To make this platform component open for further extension we should spend some time figuring out how to create the link to the storage account in a generic was so that any storage account can be used in the future.
For retrieving data the blob storage path should be helpful. When storing data we would need to determine the link to a storage account based on something else.
E.g. each controller is used to manage data in a specific type of storage account?
Information about the storage account must be included in the request?
My largest concerns about using a new platform component would be that we don't design it in a way that limits which future cases it could support.
What process is required for an app owner to store the file?
A new endpoint must be exposed in the platform component
POST: %/api/v1/data/{org}/{category}
Authorization could entail matching orgClaim in claims principal to org in route, or introducing a new scope in maskinporten.
If the categories should be possible to nest, I think the category parameter must be a query param in order to allow "/".
FileInfo is created based on metadata in the request and the blob is stored in the fileshare section of the app owner's storage account.
This is a good time to implement a blobService that doesn't hold any logic. The job of composing the storage path should be extracted from the blobClient service.
Response contains the fileInfo JSON structure with
Id / dataGuid
fileName
contentType
created
lastChanged
lastChangedBy
blobStoragePath or category
(- Could include a direct link only available for the app owner to access the element)
Managing and querying files in the file share
To delete a file in the fileShare DELETE request specifying category and guid or blobStoragePath
Get all categories returns a list of strings (loops through all folders in container)
Get metadata about all files (loops through and reads fileInfo.json for each blob)
Get metadata about a single file
All operations would have to be available to the whole organization or we could include some soft of new scope.
How to link file to an instance
authorization on org
Endpoint exposed through the application.
HTTP Post / HTTP Put org/app/instances/{instanceId}/data/link?
Query params (required a + b or c)
a) category
b) dataGuid
c ) blobStoragePath
The suggested flow is as follows
Retrieve fileInfo and ensure valid dataType is being linked to the instance.
Check that upload doesn't break any constraints e.g. number of elements of the dataType at given task.
Generate dataElement based on known info about the data with a link to the instance, and store in Platform Strage
Return info to the client.
STEP 1 - Ensure valid data type
Should be handled by the application.
STEP 2 - Check if upload doesn't break constraints
Could be handled at this point before upload is attempted or during validation.
As a user I would prefer being notified during upload, but if there are arguments to not stop the upload, this option should also be considered.
STEP 3 - Generate & store new dataElement
This responsibility lie with the app
If in app: endpoint in storage for linking will take a dataElement as input.
If in storage: endpoint in storage for linking will take fileInfo / metadata parameters as input.
STEP 4 - Return info to the client
What should be returned? The full instance or the newly created dataElement?
What is the process for unlinking a shared file from an instance?
HTTP Delete org/app/instances/{instanceId}/data/link?
Query params (required a + b or c)
a) category
b) dataGuid
c ) blobStoragePath
Deletes dataElement from cosmos, but nothing else.
How to retrieve file as an enduser
Existing Get method in platform component is used. Org, app, instance, dataGuid as input.
Authorization: if access to read instance & shared blob is linked to the instance, user is allowed to read the shared data.
How to ensure that shared file is not deleted during cleanup
Check if filepath contains a key word, if so, do not delete blob, simply delete the dataElement from CosmosDb.
How to handle in localtest
Based on all suggestions a solution for localtest will be possible to support. Won't specify this at the current moment.
Conclusion
Short summary of the proposed solution.
Tasks
[ ] Is this issue labeled with a correct area label?
Overordnet beskrivelse
Beskrivelse kopiert fra altinn-studio issue. Se kommentarer og diskusjoner på det issuet.
Sometimes the same file attachments are relevant across multiple instances and users. For now the attachment is closely bonded to an instance, thus leading to a need to upload identical files for each time the attachment is relevant.
The service owner should upload a file that is possible to reference across multiple instances and users. A specific use case:
In scope
Considerations
Out of scope
Constraints
Analysis
Where to store the files
Alternative A
Within the application owner's storage account [org]altinn[env]strg01 and container used for appdata [org]-[env]-appsdata-blob-db a new section is created for shared files (fileShare or maybe there's a more suitable name without other connotations). Putting in a new section before adding folders for the categories to make it easier to tell the instanceData appart from the shared data. Also if a category matches an appId we would have an issue.
Alternative B
A new storage account is created for each application owner solely dedicated for shared files.
Blob container structure for the fileshare:
We need to have some metadata about the blob as well such as
Where to store metadata about the files
Possible options here:
Storing in storage account
Storing in Cosmos
PostgreSQL
What would separating this logic into a new platform component look like?
Wrt. to performance and maintainability, introducing a new platform component rather than using Platform Storage wouldn't have any large effect, and the end-user will not know the difference.
A new platform component is introduced Platform Data / Platform Fileshare / Platform [insert descriptive component name]. The purpose of this component would be to expose endpoints for storing and managing data not directly related to an instance (i.e. not form data or attachments for a single instance).
The platform component would require a link to authentication (well known endpoint + redirect for missing auth) and authorization (PDP).
To make this platform component open for further extension we should spend some time figuring out how to create the link to the storage account in a generic was so that any storage account can be used in the future. For retrieving data the blob storage path should be helpful. When storing data we would need to determine the link to a storage account based on something else.
My largest concerns about using a new platform component would be that we don't design it in a way that limits which future cases it could support.
What process is required for an app owner to store the file?
A new endpoint must be exposed in the platform component POST:
%/api/v1/data/{org}/{category}
Authorization could entail matching orgClaim in claims principal to org in route, or introducing a new scope in maskinporten. If the categories should be possible to nest, I think the category parameter must be a query param in order to allow "/".
FileInfo is created based on metadata in the request and the blob is stored in the fileshare section of the app owner's storage account. This is a good time to implement a blobService that doesn't hold any logic. The job of composing the storage path should be extracted from the blobClient service.
Response contains the fileInfo JSON structure with
Managing and querying files in the file share
How to link file to an instance
authorization on org Endpoint exposed through the application. HTTP Post / HTTP Put
org/app/instances/{instanceId}/data/link?
Query params (required a + b or c) a) category b) dataGuid c ) blobStoragePathThe suggested flow is as follows
STEP 1 - Ensure valid data type
Should be handled by the application.
STEP 2 - Check if upload doesn't break constraints
Could be handled at this point before upload is attempted or during validation. As a user I would prefer being notified during upload, but if there are arguments to not stop the upload, this option should also be considered.
STEP 3 - Generate & store new dataElement
This responsibility lie with the app If in app: endpoint in storage for linking will take a dataElement as input. If in storage: endpoint in storage for linking will take fileInfo / metadata parameters as input.
STEP 4 - Return info to the client
What should be returned? The full instance or the newly created dataElement?
What is the process for unlinking a shared file from an instance?
HTTP Delete
org/app/instances/{instanceId}/data/link?
Query params (required a + b or c) a) category b) dataGuid c ) blobStoragePathDeletes dataElement from cosmos, but nothing else.
How to retrieve file as an enduser
Existing Get method in platform component is used. Org, app, instance, dataGuid as input. Authorization: if access to read instance & shared blob is linked to the instance, user is allowed to read the shared data.
How to ensure that shared file is not deleted during cleanup
Check if filepath contains a key word, if so, do not delete blob, simply delete the dataElement from CosmosDb.
How to handle in localtest
Based on all suggestions a solution for localtest will be possible to support. Won't specify this at the current moment.
Conclusion
Tasks