Generating embeddings without serving up the model checkpoint each time

erichare commented 4 years ago

Hello,

Gobbli is a fantastic package. I've been trying to use it in some of my work. One issue is it seems like the BERT checkpoint is being loaded at each call of embed(). This makes the embedding generation take 20-30 seconds on my machine.

Is there a way to "serve up" this model so that subsequent calls to embed() don't have to load the model checkpoint each time? Or would this require quite a bit of restructuring?

jasonnance commented 4 years ago

I'm glad you're finding gobbli useful!

You're correct that each call to embed() does a lot of work, although it's not just loading the checkpoint -- it's also writing all your data to disk and reading it in the container, then writing all the embeddings to disk in the container and reading it outside. Depending on how big your dataset is, that might be taking more time.

There isn't currently a way around this -- I'd consider it a fundamental limitation of gobbli's design. If latency is important to you, you may want to look into something like https://github.com/hanxiao/bert-as-service, which is better-suited for serving lower-latency responses. gobbli was only intended for experimental/batch workloads and was more designed to help you quickly determine if a model will work in a production situation rather than serving a production model.

It would be theoretically possible to rework gobbli's model Docker containers into e.g. REST API services (as opposed to single run batch processes) which could be spun up once and repeatedly used, but this would be a fair amount of work, since we'd have to essentially build a mostly-same-but-slightly-different API server within the constraints of a host of different Python environments. I don't see that happening any time soon.

I'll leave this issue open for discussion for a bit, but I don't think there's much we can do about it in the near-term.

erichare commented 4 years ago

Thank you so much for that response Jason. I had a suspicion this was the case and it makes sense why it's a technical challenge.

RTIInternational / gobbli

Generating embeddings without serving up the model checkpoint each time #17