Currently for mask generation, a new SAM embedding is produced for each input image. But the same embedding can be reused over multiple mask generation steps. So we should store the SAM embeddings once.
The simplest thing to do would be to use the filesystem for this and load the embeddings on demand.
This depends on splitting out SAM embedding inference from mask prompting.
Currently for mask generation, a new SAM embedding is produced for each input image. But the same embedding can be reused over multiple mask generation steps. So we should store the SAM embeddings once.
The simplest thing to do would be to use the filesystem for this and load the embeddings on demand.
This depends on splitting out SAM embedding inference from mask prompting.