deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
https://farm.deepset.ai
Apache License 2.0
1.73k stars 247 forks source link

Extract embedding while using parameter "extraction_strategy="per_token"" #859

Closed JiangYanting closed 1 year ago

JiangYanting commented 2 years ago

Question Hello! I used the script "embeddings_extraction.py", and I input the sentence as below: basic_texts = [ {"text": "apple is delicious fruit"} ] model = Inferencer.load(lang_model, task_type="embeddings", gpu=use_gpu, batch_size=batch_size, extraction_strategy="per_token", extraction_layer=-1, num_processes=0) result = model.inference_from_dicts(dicts=basic_texts)

I set the parameter "extraction_strategy="per_token", and the printed len(result[0]["vec"]) is 256. result[0]["vec"][0] is a 768-dimensional vector. And I am wondering this 768 vector result[0]["vec"][0] is the representation of the First word "apple", or the representation of "[CLS]" token? Thank you very much!

Additional context Add any other context or screenshots about the question (optional).

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs.