OpenFish is an open-source system written in GoLang for classifying marine species. Tasks involve importing video or image data, classifying and annotating data (both manually and automatically), searching, and more. It is expected that OpenFish will use utilize computer vision and machine learning techniques.
When importing species from inaturalist, it can be quite time consuming, and this can cause issues when deployed to appengine which has a fairly short time limit on HTTP requests.
One solution would be to use Cloud Tasks (https://cloud.google.com/tasks) which is a task queue and it would then call the appengine HTTP request to process the background job. The task would be limited to 10 minutes so we would need to split the work into smaller tasks
Splitting work into smaller tasks
Splitting the import into multiple smaller tasks would probably be beneficial, if a task fails it has less work to do when it is retried, and it would help us keep under the 10 minute limit.
Tracking progress
When you create a task it has an autogenerated task name, which we could use in a subsequent request to check up on the progress, something like this:
Background tasks for long running operations will be needed for processing youtube video and for ML training, image classification, etc, so this could be a template for how we handle these long running tasks.
Problem
When importing species from inaturalist, it can be quite time consuming, and this can cause issues when deployed to appengine which has a fairly short time limit on HTTP requests.
https://cloud.google.com/appengine/docs/standard/how-requests-are-handled?tab=go#asynchronous-work
Solution
One solution would be to use Cloud Tasks (https://cloud.google.com/tasks) which is a task queue and it would then call the appengine HTTP request to process the background job. The task would be limited to 10 minutes so we would need to split the work into smaller tasks
Splitting work into smaller tasks
Splitting the import into multiple smaller tasks would probably be beneficial, if a task fails it has less work to do when it is retried, and it would help us keep under the 10 minute limit.
Tracking progress
When you create a task it has an autogenerated task name, which we could use in a subsequent request to check up on the progress, something like this:
We could also use server sent events (https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) to monitor the task's progress, but that is not supported by GAE, only cloudrun. We could also get more detailed info (like task logs) using this method too.
Other uses
Background tasks for long running operations will be needed for processing youtube video and for ML training, image classification, etc, so this could be a template for how we handle these long running tasks.
Links
https://cloud.google.com/tasks/docs/creating-appengine-handlers#go https://cloud.google.com/tasks/docs/creating-appengine-tasks