Ucas-HaoranWei / Vary

[ECCV2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
1.65k stars 150 forks source link

The Vary-Tiny pretrain model can not be inferenced #82

Open lucasjinreal opened 3 months ago

lucasjinreal commented 3 months ago

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/Vary/run_opt.py", line 111, in eval_model(args) File "/Vary/run_opt.py", line 80, in eval_model output_ids = model.generate( File "/data/miniconda3/envs/env-3.9.2/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/miniconda3/envs/env-3.9.2/lib/python3.9/site-packages/transformers/generation/utils.py", line 1592, in generate return self.sample( File "/data/miniconda3/envs/env-3.9.2/lib/python3.9/site-packages/transformers/generation/utils.py", line 2734, in sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0

Looks like it doesn do any meanful output for vary -tiny?

I suppose it should do OCR ability as it trained on millions of arxiv articles images.

Ucas-HaoranWei commented 3 months ago

This is because the version of the inference code is wrong, I will release the correct version today.

lucasjinreal commented 3 months ago

。。。。。 Holy ..... Today can we get the correct inference version?

I would suggest please release the code first rather than just drop a weights without any clarification, this would waste other's many time.

Ucas-HaoranWei commented 3 months ago

I upload the run_open_with_text.py in /demo/, please try it.

lucasjinreal commented 3 months ago

HI

"use_im_start_end" is not definedPylancereportUndefinedVariable

lucasjinreal commented 3 months ago

Still same error:

image

Ucas-HaoranWei commented 3 months ago

I fix the bug now,

  1. You need the new Vary-master/vary/model/vary_opt.py
  2. the new run_open_with_text.py the input: image the output: image
lucasjinreal commented 3 months ago

@Ucas-HaoranWei Hi, what the error previously?

BTW, except for the Provide the OCR results of this image. Does other intruct like read this , texts on image, this is what etc, can work on opt model?

shuiqingliu commented 3 months ago
image

When I use run_open_with_text the output is garbled as shown in the picture above.What could be the possible reason for this?

Ucas-HaoranWei commented 3 months ago

You need the new Vary-master/vary/model/vary_opt.py

lucasjinreal commented 3 months ago

I think this is normal.

My guess is the pretraining data in vary-opt has unicode characters, this might happen if the pdf are not decode properly, mostly because of it's quaility.

I have tested on some un-doc image, it hardly to produce OCR result. So that the opt itself has some limitations or it over-fit to documents data.

Ucas-HaoranWei commented 3 months ago

The vary-opt can not handle the formulas because we do not use such data, but the pure texts for both Chinese and English are OK. The input: arxiv-1 The output: image

I guess you need to verify the transformers-version and best to rebuild the Vary.

shuiqingliu commented 3 months ago

The vary-opt can not handle the formulas because we do not use such data, but the pure texts for both Chinese and English are OK. The input: arxiv-1 The output: image

I guess you need to verify the transformers-version and best to rebuild the Vary.

Thank you for your attempt and hints @Ucas-HaoranWei

lucasjinreal commented 3 months ago

image image \

why still got unexpected result/

lucasjinreal commented 3 months ago

transformers 4.39 can not get right result, why?

Ucas-HaoranWei commented 3 months ago

Yes, after my testing, currently Vary-opt only can output the correct result based on transformers 4.32.1. I have tried 4.37.2 but cannot output the correct result. This may be because my code did not adapt to the new version of transformers. If I have free time recently, I will solve this problem.