-
Make the tokenizer 100% API compatible with the Huggingface tokenizer as proposed by @jamaliki in https://github.com/OpenBioML/protein-lm-scaling/issues/12#issuecomment-1682513840
Some benefits of …
-
(See #10)
The current tokenizer doesn't to be cutting it. For example, cases where we likely want another tokenization (that is, the following are full tokens):
* golgin-84
* VBA1)-deleted
* p…
-
**Describe the bug**
When deeplima is asked to analyze several files, the first one is analyzed correctly, but then the program stalls.
**To Reproduce**
Steps to reproduce the behavior:
1. Run `…
kleag updated
4 months ago
-
There is an issue with '\n' not working properly in llama3. When passing '\n' through tokenizer.encode, it outputs the token ID 198, but it does not terminate the sentence generation appropriately and…
-
https://github.com/AppThreat/atom
We created an open-sourced atom for the precise identification of usages and dataflows across large code bases. This approach is better for summarizing and identif…
-
I like [XML Namespaces](http://www.w3.org/TR/REC-xml-names/), they are simple and elegant, let's support them on tokenization level.
Basically extend the spec so if you encounter a `:` (not as first …
Ygg01 updated
9 years ago
-
When running the Google Colab notebook, it looks like there is some error when loading the Mixtral Instruct Tokenizer:
```
[/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_…
-
Current `@lexical/markdown` package supports shortcuts (auto-replace while typing), import and export relying on the same transformers configuration. It also uses RegExp parsing that does not have any…
-
报错结果如下(无论是远程连接还是本地下载都有这个问题):这个报错是在colab上shiyongpython调用时报的错
![Uploading 屏幕截图 2024-06-10 161650.png…]()
---------------------------------------------------------------------------
AttributeError …
-
### Description
The multi select component when interacted with drops down all potential options. Once an option is selected from the drop down it is added to the "please select field" but remains an…