Open philippconzett opened 3 years ago
MS Azure offers three access tiers:
I guess for the purpose of DataverseNO, where files may be accessed quite often, only option 1 is appropriate?
Currently dataverseno only has hot storage Cool and Archive have significant data retrieval cost, it would be unwise to use them in public accessible data. Also Archive involves a 15 hour delay to retrieve the data.
@Louis-wr @rolfa015 What is the "thing" we are going to use in Azure to store the DataverseNO data files: a blob container, block blob, page blob, something different? And what will be the size limit we'll be able to store in that "thing"? Cf. https://docs.microsoft.com/en-us/azure/storage/blobs/scalability-targets.
@philippconzett We will be using Azure Blob storage and data are stored as block blobs. Azure Storage Account (think of it as the storage server) can consist of several Blob containers (think of it as volumes) and a Blob container can consist of several blobs (think of it as files and directories). Limitations are: Each block blob can be maximum of 50,000 X 4000 MiB (approximately 190.7 TiB) A Blob container can be maximum the same size as a storage account. A Storage account can be maximum 5 PB I understand that we will enable NFS on the storage service so these issues are also relevant: https://docs.microsoft.com/en-us/azure/storage/blobs/network-file-system-protocol-known-issues
@rolfa015 Thanks! Do we need the NFS functionality given that MINIO makes our storage S3-compliant?
@philippconzett If we stick to S3 also for "normal" storage then we don't need NFS. But maybe we need it for storage connected to the postgres DB. If we don't go for postgres DB as a service (outside the docker VM).
Update: We will be using UiO's Cloudian storage for the entire DataverseNO. Question: Should we create separate storage buckets for each collection?
Update 2: DataverseNO will hire GDCC to develop support for keeping folder structure also with direct upload to S3. Estimated to be in place by the end of September.
We want DataverseNO to support direct upload and download of (larger) files.
Preferably by using Dataverse support for S3 Direct Upload and Download as described in the Dataverse Developer Guide.
If S3 compliant support is not possible or does not work as efficiently as with AWS or Google, we might want to consider developing similar support for MS Azure blob storage.
See also notes from 2021-07-27 Dataverse Project Community Call about Large File Support through S3.