S3 should contain all meta data for surfaces and topographies

mcrot commented 5 years ago

This issue has two purposes:

in case of a complete loss of the database and its backups, it should be possible to reconstruct the surfaces and topographies with data files and meta data from files in S3; in the meta data should be included who created a surface and who uploaded a topography file (created the topography); no internal numbers should be used, the user should be identified by orcid id
for issue #48, we need meta data in YAML format for download

Therefore we decided to restructure the object names in the S3 storage. Currently we have "folders" for each user, e.g.

/media/topographies/user_3

contains all data files for all topographies added to a surface from user_3, so e.g. there are objects

/media/topographies/user_3/500x500_example.txt
/media/topographies/user_3/5000x5000_example.txt
/media/topographies/user_3/50000x50000_example.txt

We decided to use extra subfolders for each surface, together with an YAML file containing all meta data for this surface and all its topographies, e.g.

/media/topographies/user_3/surface_10/500x500_example.txt
/media/topographies/user_3/surface_10/5000x5000_example.txt
/media/topographies/user_3/surface_10/50000x50000_example.txt
/media/topographies/user_3/surface_10/meta.yaml

In order to implement #48, a celery task can be generated which takes all these files and creates a ZIP file for download.

pastewka commented 3 years ago

We should use the dtool API for this, see https://github.com/jic-dtool/dtoolcore

mcrot commented 3 years ago

Now we can save containers for surfaces which can also be imported again later. We could use them to store all meta data for surfaces and topographies.

Or should we still use the dtool API for this? I suppose we don't want to save the data in our bucket for datasets but in topobank's bucket.

pastewka commented 2 years ago

We should probably not switch to dtool for this (although I like the idea). This would require a bit more thought on how to integrate dtool with Django rather than just sticking it on top of it.

I like the above proposal which should address this issue.

pastewka commented 10 months ago

We could automatically create a ZIP container for each digital surface twin. This would close this issue and also make ZIP files available for downloads instantaneously, see #249

ContactEngineering / topobank

S3 should contain all meta data for surfaces and topographies #199