facebookresearch / StarSpace

Learning embeddings for classification, retrieval and ranking.
MIT License
3.94k stars 531 forks source link

Implement python wrappers for predictions and tune speed #273

Closed Bowid closed 4 years ago

Bowid commented 5 years ago
  1. Predict now use std::set instead priority_queue. From my point of view it is better to maintain set of fixed size and drop predictions with too low score immideately instead storing them in ordered manner.
  2. Implemented extension class of StarSpace to make things a bit more pythonic. It is definitely better to have return value instead of passing parameters to fill to functions. From my point of view, speed is not changed in case modern C++ compilers handle return of complex values properly without copying them.
facebook-github-bot commented 5 years ago

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

facebook-github-bot commented 5 years ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

pmirla commented 5 years ago

Hello Bowid. I am also working on python wrapper. Can you please write back with your contacts to pavanmirla @gmail. Thanks

Bowid commented 5 years ago

Hello Bowid. I am also working on python wrapper. Can you please write back with your contacts to pavanmirla @gmail. Thanks

Replied. Also here is the sample, let it be here

import starwrap

model_path = '/path/to/trained/model'

args = starwrap.args()
model = starwrap.starSpace(args)
model.initFromSavedModel(model_path)

line = input("Enter text for classification: ")
parsed = model.parseDoc(line, ' ')

# Output is list of tuples in format (probability, token_id)
# Predict takes two arguments: parsed doc and maximum predictions count
tokens = model.predict(parsed, 10)

probabilities = [item[0] for item in tokens]
# Token_id is integer, thus it should be rendered to feature (string)
features = model.renderTokens(tokens)

for feature, probability in zip(features, probabilities):
    print('Feature {f} probability {p}'.format(f=feature, p=probability))
Bowid commented 5 years ago

I have some more thoughts to make things more pythonic, but it is nice to have feedback from FB guys and probably merge this PR to master.