ManifoldRG / NEKO

In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
https://discord.gg/brsPnzNd8h
GNU General Public License v3.0
38 stars 9 forks source link

The training loss and perplexity is not decreasing for the VQA task #78

Open henryj18 opened 4 months ago

henryj18 commented 4 months ago

Refer to https://github.com/ManifoldRG/NEKO/pull/30 about the PR to add caption and vqa task.

Refer to https://github.com/ManifoldRG/NEKO/pull/77 about the PR to merge the code for caption task and vqa task (all of the work was completed on add_vqa branch) to master

Now the merge has completed, we submit this issue to keep track of multiple issues that need to be investigated/resolved

Issue 1: the training loss and perplexity is not decreasing for the VQA task

For the vqa task, the training loss and perplexity is not decreasing with the tests we have done up to about 14K images and their associated VQA. The test was done on Colab and we could not scale up for more images due to Colab I/O error to handle too many files in a folder.

We need to investigate this issue from at least two avenues: 1) Scale up more training data on a hardware where I/O is not causing error for more images in a folder. This is for purpose to see whether the non-decreasing perplexity is due to training dataset is too small 2) Inspect the algorithm for VQA and see whether it should be revised

Issue 2: need to implement a evaluate for caption task and vqa task in eval.py

Any other issue will be entered here as they show up