-
I meet a problem when I use tensor2tensor train a translate model, and decode some sentence.
The Error is ' IndexError: string index out of range'
so I debug the error sentence, and find it genera…
-
Hi,
Thank you very much for your amazing work.
I used NLTK to segment a german text. I see that this language is available and the sentence tokenizer gives quite good result with the default traini…
-
I loaded meta-llama/Llama-2-7b-chat-hf into GPU, and tried to get response to a question.
Here is the key part of the code:
```
def load_model(model_name, bnb_config):
n_gpus = torch.cuda.de…
-
KeyLLM seems to be extracting keywords which are not even present in the document used. I am following the steps mentioned in this article - https://towardsdatascience.com/introducing-keyllm-keyword-e…
-
Here is the current list: http://data.iana.org/TLD/tlds-alpha-by-domain.txt
This will allow us to successfully pass the following spec:
``` ruby
it 'knows what is not a domain 1' do
skip "NOT IMPL…
-
Hey! First of all, thank you for the awesome work you are doing.
Would be grateful if you can help me out with the following situation:
I have an unlabelled dataset which is domain specific and I w…
-
There are places for improving runtime performance:
* Use local variables as proxy for class variables
* Tokenizers split vs regex vs merged sentencize/tokenizer
* Compile regexes
* Async processi…
-
Hi!
While working on my master in language technology I discovered some errors in the Norwegian tokenizers:
nltk.word_tokenize(“Hello NLTK”, “norwegian”)
nltk.sent_tokenize(“My name. Is bob.”, …
-
I using 4xH100, 100 CPU cores, 1000 RAM to filter 1TB data japanese. Although the GPU is at 50% utilization and the CPU is running at 100%, only 3MB of data is processed per minute. I suspect that the…
-
## Work Planning
Details
## Table of Contents
- [Housekeeping](#housekeeping)
- [Named Concepts](#named-concepts)
- [Summary](#summary)
- [Reference-Level Explanation](#reference-level…