The output of the largest model seems unreliable.

We are running the XXL Unified IO 2 model (pytorch version) on a NVIDIA GeForce RTX 4090. The demo.py file is shown below:

from uio2 import config
from uio2.model import UnifiedIOModel
from uio2.preprocessing import UnifiedIOPreprocessor
preprocessor = UnifiedIOPreprocessor.from_config(config.XXL, "tokenizer.model")
model = UnifiedIOModel(config.XXL)

from uio2.runner import TaskRunner
runner = TaskRunner(model, preprocessor)

for _ in range(3):
    ans1 = runner.categorization("something.jpg", ["boy", "girl", "cat", "dog", "car", "plane", "sun"])
    print(f"Prediction: {ans1}")

something.jpg file is actually a picture of cat.

The output is shown below:

Prediction: sun
Prediction: sun
Prediction: sun

Are we doing anything wrong?

allenai / unified-io-2.pytorch

The output of the largest model seems unreliable. #3