We need to think about how to, bring the changes we made in the direct-cog branch to the storage server.
And how we implement the processing sustainably.
Because of the resource contains of the storage server, it makes a lot of sense to outsource the processing to a dedicated processing server instead of running into all the trouble again.
geosense has a local server we could use for this. The server could run all the resource intensive processes:
cog generation
thumbnail generation (is unfortunately also pretty resource hungry since I need to load the original TIF)
(later) prediction of ai models (which have to be run here any way)
All this should be in a separate repo. I would implement it as a package / docker container scheduled by cron, not a REST API. Since the processing server is behind a vpn and the communication needs to be a one-way street from the processing server to the storage server.
The storage server would still implement:
upload
metadata
download
labels
The current queuing system could be used to manage the processes.
The process would be the following:
data is uploaded to the storage server via the datasets route
metadata is generated via the metadataroute
the storage server adds a process in the current queuing system (supabsae table)
The processing server scans the queue, and if a new process is found, download the tiffs to process them and upload them again. The logic of the current cog and thumbnail generation can be completely reused, also the different states apply.
We need to think about how to, bring the changes we made in the
direct-cog
branch to the storage server. And how we implement the processing sustainably.merge
Regarding the merge, I would suggest the following. There are several modifications to the code regarding the data model. https://github.com/Deadwood-ai/deadwood-api/blob/65212d9cd9e61942417e5ae9072057392a220286/src/models.py So to be able to use the
metadata
route we need to merge these changes. Since we don't want theforce-direct-cog
open on the live system, i suggest to remove the route simply.implement processing.
Because of the resource contains of the storage server, it makes a lot of sense to outsource the processing to a dedicated processing server instead of running into all the trouble again. geosense has a local server we could use for this. The server could run all the resource intensive processes:
All this should be in a separate repo. I would implement it as a package / docker container scheduled by cron, not a REST API. Since the processing server is behind a vpn and the communication needs to be a one-way street from the processing server to the storage server.
The storage server would still implement:
The current queuing system could be used to manage the processes. The process would be the following:
datasets
routemetadata
routeWhat do you think? @mmaelicke @cmosig