Open qhd1996 opened 4 years ago
Hi I have read your paper, there are three multilingual models mBERT ← SBERT-nli-stsb、DistilmBERT ← SBERT-nli-stsb、XLM-R ← SBERT-nli-stsb.
The currently available one is DistilmBERT ← SBERT-nli-stsb?And would you like to provide the two other models? Thx
Hi @xuwenshen The currently available distiluse-base-multilingual-cased is a DistilBERT version of multilingual Universal Sentence Encoder (USE).
The ← SBERT-nli-stsb models are not yet available. I currently work on extending them to even more languages and see what the limit there is.
Best Nils
They don't download automatically for me? I'm using transformers 2.8.0 and sentence-transformers 0.2.6.1. This is the error I get when I try to run model = SentenceTransformer('bert-base-nli-mean-tokens')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-1-c4e037d073ba> in <module>
1 from sentence_transformers import SentenceTransformer
----> 2 model = SentenceTransformer('bert-base-nli-mean-tokens')
~/opt/miniconda3/envs/advice/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py in __init__(self, model_name_or_path, modules, device)
73 logging.warning("You try to use a model that was created with version {}, however, your version is {}. This might cause unexpected behavior or errors. In that case, try to update to the latest version.\n\n\n".format(config['__version__'], __version__))
74
---> 75 with open(os.path.join(model_path, 'modules.json')) as fIn:
76 contained_modules = json.load(fIn)
77
FileNotFoundError: [Errno 2] No such file or directory: '/Users/venkat/.cache/torch/sentence_transformers/public.ukp.informatik.tu-darmstadt.de_reimers_sentence-transformers_v0.2_bert-base-nli-mean-tokens.zip/modules.json'
EDIT: I figured out the issue. I had installed sentence-transformers (an older version) and used it for a bit, so it probably thought that the models had already been downloaded. Deleting the cache folder works - it downloads the models now
Hi @xuwenshen The currently available distiluse-base-multilingual-cased is a DistilBERT version of multilingual Universal Sentence Encoder (USE).
The ← SBERT-nli-stsb models are not yet available. I currently work on extending them to even more languages and see what the limit there is.
Best Nils
Cool!!! Actually i try distiluse-base-multilingual-cased on my scenario and the results are quiet good. Can u explain more about this model, it means the teacher model is mUSE and incorporate parallel distillation? Look forward for your ← SBERT-nli-stsb models.
Hi @xuwenshen Correct, the teacher model is mUSE, the student model is DistilBERT multilingual. It was than trained on Wikipedia sentences for the different languages. In the same way, it could also be extended to further languages that are not supported by mUSE.
I currently have to finish the last experiments. I hope I can then provide more code & documentation on the multilingual part.
Best Nils
Can I safely assume that the license applicable to sentence-transformers (Apache License, Version 2.0) is applicable to the pre-trained weights as well?
Hi @abmitra84 From our side there are no restrictions with the models. Some restrictions might apply to the data, that was used to train the models. There, the agreements to the individual datasets must be checked and your local regulations (laws) about derived work from datasets.
@nreimers Appreciate your response
@venkatasg, I had met the save error, did you resolve it? Thank you!
@zmh908264302 Yes I believe deleting the cache solved it for me.
@nreimers could you please share if xlm-r-100langs model is covered under Apache 2.0 Thanks
@seekingpeace Yes, they are shared with the same license as our code.
@nreimers Since the models are Apache 2.0 as mentioned above, could you please associate the apache-2.0
license tag with all the sentence-transformers
models in the model hub?
E.g. without license label https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2 E.g. with license label https://huggingface.co/bert-base-uncased
I know it's a lot of models and I'd be happy to help modify all the README.md
files in the model hub (if that's easier than regenerating them all [1]).
It is sufficient to specify the name:
Models are downloaded automatically.
Otherwise, you can download them here: https://public.ukp.informatik.tu-darmstadt.de/reimers/sentence-transformers/v0.2/
Best Nils Reimers