Open ayulockin opened 2 years ago
Hello. The image inputs in the model are usually several features tensors but images tensors. When the model is training or testing, it has nothing about the image itself. The image file could be downloaded in its datasets website.
In the past, I wrote a simple checking and finding program to find the corresponding image_id in the image files. This looks a little toilsome but is a method.
For the texts inside the SampleList, it could be converted by the function object_to_byte_tensor( )
or byte_tensor_to_object()
in the mmf\utils\distributed.py
https://github.com/facebookresearch/mmf/blob/b672a745996eb0549a0b903a30a225a8f0668182/mmf/utils/distributed.py#L244
That functions were also used in the prediction script for converting the text. https://github.com/facebookresearch/mmf/blob/b672a745996eb0549a0b903a30a225a8f0668182/mmf/datasets/builders/textvqa/dataset.py#L42-L55
Perhaps you could use this function in the SampleList
part to get the converted results and record it in the file. However, I do not realize more details about this two functions. That is all my known.
❓ Questions and Help
Apologies if this is already covered somewhere or I am missing something, but I am unsure about how and where the actual image and text tensors are given to the model as input.
Context
I am trying to build a model prediction visualizer using W&B Tables. I have noticed that in the
default.yaml
file if I doevaluation.predict=True
, it will write a.json/.csv
file withquestion_id
,image_id
,answer
vocab source. This is great but it can be made much more useful if we can interactively look at the data and model prediction.What I am trying to build?
The screenshot shown below is an example where W&B Tables is used to visualize model prediction of YOLOv5 for the COCO dataset.
I am trying to build something similar and here's the screenshot of my barebones Tables:
I started building this on top of
TestReporter
inside thetest_reporter.py
file.Where am I stuck?
As you can see I am only logging the
question_id
andimage_id
but not the actual question string and image. Theevaluation.predict=True
callsprediction_loop
insideevaluation_loop.py
file. When I inspect theprepared_batch
it gives aSampleList
and havequestion_id
andimage_id
beside other features/info.I want to know how I can parse these ids to get the actual tensor? Or how should I get the actual text from tokenized
text
inside thisSampleList
?Basically, I want to understand the MMF method of parsing each data sample that I can log to build the Table.
PS: I have gone through the available documentation and did my own digging through the codebase but I feel lost. Would appreciate any feedback or direction of approaching this. If something is not clear please ask away. :)