LukeForeverYoung / UReader

Apache License 2.0
102 stars 6 forks source link

您好!我在实现OCR识别功能的时候出现识别文本到一半就截断的现象,请问我该怎么做呢?是需要调整哪里的参数吗?非常感谢! #12

Closed Selvaggiar closed 3 months ago

Selvaggiar commented 6 months ago

识别文本到一半就截断的现象,例如: image

图片中含清晰文本, 可是UReader识别结果是:Document understanding refers to automatically extract, analyze and comprehend information from various types of documents, such 识别结果中途截断了。 不知道应该修改代码哪处,非常期待和感谢您的解答!

shlyahin commented 6 months ago

Hi, I have the same question. But I think, the thing is that during FT model was trained to give only short answers. Maximum sequence length is 512 in generation parameters, so it does not limit answer generation.

Selvaggiar commented 3 months ago

Hi, I have the same question. But I think, the thing is that during FT model was trained to give only short answers. Maximum sequence length is 512 in generation parameters, so it does not limit answer generation.

The default value for the parameter,Maximum sequence length, should not be 512; when calling function XXX, it is necessary to specify a value for this parameter.