Outsource processing to seperate processing Server

JesJehle commented 2 months ago

We need to think about how to, bring the changes we made in the direct-cog branch to the storage server. And how we implement the processing sustainably.

merge

Regarding the merge, I would suggest the following. There are several modifications to the code regarding the data model. https://github.com/Deadwood-ai/deadwood-api/blob/65212d9cd9e61942417e5ae9072057392a220286/src/models.py So to be able to use the metadata route we need to merge these changes. Since we don't want the force-direct-cog open on the live system, i suggest to remove the route simply.

implement processing.

Because of the resource contains of the storage server, it makes a lot of sense to outsource the processing to a dedicated processing server instead of running into all the trouble again. geosense has a local server we could use for this. The server could run all the resource intensive processes:

cog generation
thumbnail generation (is unfortunately also pretty resource hungry since I need to load the original TIF)
(later) prediction of ai models (which have to be run here any way)

All this should be in a separate repo. I would implement it as a package / docker container scheduled by cron, not a REST API. Since the processing server is behind a vpn and the communication needs to be a one-way street from the processing server to the storage server.

The storage server would still implement:

upload
metadata
download
labels

The current queuing system could be used to manage the processes. The process would be the following:

data is uploaded to the storage server via the datasets route
metadata is generated via the metadataroute
the storage server adds a process in the current queuing system (supabsae table)
The processing server scans the queue, and if a new process is found, download the tiffs to process them and upload them again. The logic of the current cog and thumbnail generation can be completely reused, also the different states apply.

What do you think? @mmaelicke @cmosig

mmaelicke commented 2 months ago

I think this was layed out pretty well and we can go for it. I have a few comments to point 4, but only minor stuff. Ich ruf morgen an

JesJehle commented 2 months ago

As next steps, I would suggest:

[x] create a new branch from direct-cog for merging
[x] in this branch fix download #47
[x] resolve merge conflicts
[x] update /datasets/ route

Deadwood-ai / deadwood-api

Outsource processing to seperate processing Server #48

merge

implement processing.