Open amanikiruga opened 1 month ago
@amanikiruga Thanks for the report. Indeed, we broke the compatibility during refactoring. Try the github version, i.e. I hope the compatibility will be restored by https://github.com/facebookresearch/pytorch3d/commit/d0d0e020071c34ffa0953c00f9e85be0672b597e
More generally though, this field is an implementation detail (it is NOT the annotation bounding box, but merely an intermediate variable used in padded cropping). Does your code mean to load uncropped frames? In that case, you could just pass box_crop=False
. The dataset API is quite flexible.
Thank you @shapovalov I did of course just try use the intermediate value calculated in the current github version but it gave different values to the 0.7.2 values here: https://github.com/facebookresearch/pytorch3d/blob/9fb452db5c8bc1913c6e33fd7056033f6c59d634/pytorch3d/implicitron/dataset/json_index_dataset.py#L446
I'm not sure why but I can look into it if that will be helpful.
Indeed I am using the cropped values and I would just need to know the cropped coordinates in the original resolution. This is because I wanted to get the foreground object from the original image but the foreground mask is in the 800x800 cropped resolution. Unless there is a better way of doing this?
@amanikiruga I see. If you create a dataset turning off resizing and cropping, it will return the original images and masks. To do that, in your code, pass the frame data builder args explicitly, something like:
dataset_JsonIndexDataset_args=DictConfig(
{"remove_empty_masks": False,
"load_point_clouds": True,
"image_height": None,
"image_width": None,
"box_crop": False,
}
),
Then you can resize the mask if you have to.
Indeed I am using the cropped values and I would just need to know the cropped coordinates in the original resolution.
Then probably the cropping behaviour may have changed. This value is indeed used (almost) directly to crop the image before resizing. Do you also see different crops in the old version?
To produce this variable, we:
1) call get_clamp_bbox
to pad the bounding box with box_crop_context
,
2) call clamp_box_to_image_bounds_and_round
to make sure the bbox is within image bounds.
There are also some xyxy/xywh conversions along the way. This behaviour does not seem changed by https://github.com/facebookresearch/pytorch3d/commit/ebdbfde0cee9d6adca2c0508f1a664c13d3cd65a Feel free to dig deeper if you are still blocked.
If you do not know the root cause of the problem / bug, and wish someone to help you, please post according to this template:
🐛 Bugs / Unexpected behaviors
The
FrameData
when loading CO3D dataset on the latest versionpytorch3d-0.7.7-py310_cu121_pyt231
shows thecrop_bbox_xywh
attribute to beNone
even though on older versions eg.0.7.2
don't.NOTE: Please look at the existing list of Issues tagged with the label 'bug`. Only open a new issue if this bug has not already been reported. If an issue already exists, please comment there instead..
Instructions To Reproduce the Issue:
I'm preprocessing CO3Dv2 following instructions from a paper's github repository. This issue is also referenced there: issue_link
Please include the following (depending on what the issue is):
git diff
) or code you wroteimport math import os import tqdm from PIL import Image from omegaconf import DictConfig
from pytorch3d.renderer.camera_utils import join_cameras_as_batch from pytorch3d.implicitron.dataset.json_index_dataset_map_provider_v2 import JsonIndexDatasetMapProviderV2 from pytorch3d.implicitron.tools.config import expand_args_fields import os
CO3D_RAW_ROOT =os.getenv("CO3D_RAW_ROOT") CO3D_OUT_ROOT = os.getenv("CO3D_OUT_ROOT")
assert CO3D_RAW_ROOT is not None, "Change CO3D_RAW_ROOT to where your raw CO3D data resides" assert CO3D_OUT_ROOT is not None, "Change CO3D_OUT_ROOT to where you want to save the processed CO3D data"
def update_scores(top_scores, top_names, new_score, new_name): for sc_idx, sc in enumerate(top_scores): if new_score > sc:
shift scores and names to the right, start from the end
def main(dataset_name, category):
def get_max_box_side(hw, principal_point_x, principal_point_y):
assume images are always padded on the right - find where the image ends
def crop_image_at_non_integer_locations(img, max_half_side: float, principal_point_x: float, principal_point_y: float): """ Crops the image so that its center is at the principal point. The boundaries are specified by half of the image side. """
number of pixels that the image spans. We don't want to resize
def read_seq_cameras(dataset, sequence_name):
if name == "main": for category in ["hydrant", "teddybear"]: for split in ["train", "val", "test"]: bad_sequences_val = main(split, category)