UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.77k stars 2.43k forks source link

page level embeddings #1000

Open INF800 opened 3 years ago

INF800 commented 3 years ago

I am working on pdf page classification. I want to use https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/quora_duplicate_questions/training_OnlineContrastiveLoss.py to generate page level embeddings.

nreimers commented 3 years ago

Hi, 1) There are some models in Huggingface transformers that allow longer input texts than 512 word pieces. Check the docs from transformers which models would be suitable

2) The loss function depends on what type of data you have. Have a look here: https://www.sbert.net/docs/package_reference/losses.html