Scaling extract_features

simra commented 5 years ago

I'm looking for a few pointers on how to efficiently scale up extract_features. Unlike training, there isn't a lot of information on distributed prediction out there- I'd like to try one or more of the following:

Specify the GPU device that I'm using, so I can solve it via multiple processes on a multi-GPU machine. It's not clear to me how the GPU device can be specified via estimator config.
Better, automatically scale out estimator.predict() to utilize all available GPU devices.
Re-use partial inputs. For example, if I'm featurizing query-passage pairs, pre-load the query portion of the input tensor while iterating through passages.

Any suggestions or pointers would be most appreciated. Feel free to redirect me to stackoverflow if that's a better venue to answer these questions.

hanxiao commented 5 years ago

maybe you are looking for bert-as-service https://github.com/hanxiao/bert-as-service/ it's a highly scalable feature extraction service based on BERT.

simra commented 5 years ago

Thanks I will try this.

google-research / bert

Scaling extract_features #397