-
### Describe the problem
Built in support for Google's Universal Sentence Encoder (USE) which can useful for greater-than-word length text, such as sentences, phrases or short paragraphs.
[USE Pap…
-
If someone plans to load sentences in Python or similar, be aware that the file is encoded using the `macintosh` charset. This worked for me to get the original sentences in UTF-8:
`iconv -f macint…
-
Hello,
I am getting this error when try to clustering in Spanish (see ERROR below). I assume my corpus should have a problem. Could you help me to find the nature of the error? (It works perfectly …
-
I originally thought that this issue is only specific to the zh-hk locale, but later realize that this is quite widespread and seriously harming the data quality of many languages. So currently, some …
-
This happens when trying to tokenize ( clip.tokenize(train_sentences).to(device) ) sentences that have less than 77 tokens (for example 44), but some of them are unknown.
I have tried to operate th…
-
Hi again, if using a German text with special characters as source language and Google as engine it comes to this error:
Reading text into memory.
Tokenizing text into sentences.
Starting transla…
-
Hi. Some of the dump files on the Downloads page are incorrectly formatted.
The details field on the user_languages.csv file, for example, allows tabs and newlines, which should not be allowed in a…
-
Hi,
I would like to export sentence-transformers model to PyTorch. However, I am not able to jit trace the **stsb-distilbert-base** model.
Any help is much appreciated.
Thanks,
-s
sentenc…
-
(py38nlp) ➜ /Users/admin/Seq2SeqWithPGN python build_vocab.py
Traceback (most recent call last):
File "build_vocab.py", line 29, in
vocab, reverse_vocab = generate_vocab(sentences_path)…
-
Hi,
In the sample code example provided for SimCSE, should we set the **dropout** or has it been set emplicitly:
```
model_name = 'distilroberta-base'
word_embedding_model = models.Transformer…