Hi,
I am confused regarding how you tackled the "bbox" information in the code. I can see loading the image features x from ".npz" file only.
Also, it is mentioned that we can work with grid features as well. Grid features' file with ".pth" extension only contains features/weights with tensor size [1, 2048, 19, 29](a sample feature file) and not any bounding box information, object detection etc. Then how can we cater those features without any such information.
Hi, I am confused regarding how you tackled the "bbox" information in the code. I can see loading the image features x from ".npz" file only.
Also, it is mentioned that we can work with grid features as well. Grid features' file with ".pth" extension only contains features/weights with tensor size [1, 2048, 19, 29](a sample feature file) and not any bounding box information, object detection etc. Then how can we cater those features without any such information.