Closed Studentpengyu closed 10 months ago
You should check the feature size of image feature and also the text feature. Maybe you have changed the feature size in the previous steps.
Hi zihan,
I have not made any modifications to the code provided by you, except for the part that I previously mentioned in the screenshots. Despite following your guidance and making those changes, I am encountering issues. This leads me to wonder if there are other parts of the code that also require modification for this specific use case.
Would it be possible for you to share a version of the code that has been successfully run for training LViT models without text? Having a complete working example would be immensely helpful and would greatly aid in my understanding and application of your work.
Thank you very much for your time and for sharing your expertise. Looking forward to your response.
Best regards, Pengyu
Or you can just create zero matrix with the feature size (512, 768, 3). And take it as the text feature.
Ok. Thank you for your prompt and helpful response.
It seems to me that the method for 'training without text' isn't fixed and primarily involves replacing the text information with a tensor that allows the network to function normally.
If my understanding is correct, please feel free to close this issue.
I appreciate your time and the valuable insights you have provided.
Dear zihan:
I hope this message finds you well. Firstly, I want to extend my gratitude for your innovative work on LViT. It has been a valuable resource, and I have successfully obtained results using the provided methods.
However, I am currently encountering a challenge with training LViT models without using text. I referred to your previous responses and attempted a modification based on this screenshot from your GitHub repository:
I modified the code as follows, as per my understanding:
Unfortunately, this change did not yield the expected results, and I encountered the following error:
Could you please provide some guidance or additional details on the correct approach to train LViT models without text? Any advice or further clarification you can offer would be greatly appreciated.
Thank you for your time and consideration. I look forward to your valuable input.
Best regards, Pengyu