-
对run_ocr_2.0.py的代码做了一些修改后运行在Notebook中,识别./GOT-OCR2.0/assets/wechat3.jpg等图片出现的打印信息,还望解惑谢谢
![捕获1](https://github.com/user-attachments/assets/d3fcdd65-f4b4-4e3b-ad55-4760714b95f2)
![捕获2](https://github…
-
I noticed this behaviour in , but it seems to be part of this excellent project.
What I see is that [HTML Entities](https://www.w3schools.com/html/html_entities.asp) cause a break split after the n…
-
## Information
The problem arises in chapter:
* [ ] Introduction
* [ ] Text Classification
* [ ] Transformer Anatomy
* [ ] Multilingual Named Entity Recognition
* [ ] Text Generation
* [ ] …
-
The following html triggers a bug in the tokenizer:
`&abc`
You'd expect this to cause a validation error. Instead it triggers a bug in the tokenizer.
-
I encountered a runtime error while using the transformers-interpret library with a fine-tuned LLama-2 model that includes LoRA adapters for sequence classification. The error occurs when invoking the…
-
```
HTML5 specifies that nearly anything but whitespace may be a character in a tag
name (other than the first). html-sanitizer.js defines a tag name as /[-\w:]+/.
This discrepancy could result in a…
-
**Describe the bug**
I followed the instructions in:
https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/gpt/gpt_training.html
the i replace 1024 with 512
```
pyth…
-
I created a local playground here https://microsoft.github.io/monaco-editor/monarch.html with the following:
```js
return {
defaultToken: "invalid",
tokenizer: {
root: [
[/^(th…
-
I have 2 4090 and I want to merge 8 7B models. But I get out of memory.
And only one GPU is used. So, how to use 2 4090 simultaneously.
Or there is other method to solve this?
-
Hi,
Saw you already fine-tuned a MarkupLM model and uploaded it to the hub. Great work!
I was just wondering if the current API of `MarkupLMTokenizer` makes sense and is useful. Instead of havin…