Open JakeRadMSFT opened 1 year ago
@luisquintanilla - @michaelgsharp and I were chatting and it probably makes sense to have a ObjectDetectionResize transform in ML.NET. Something that resizes images and bounding boxes the same way.
We would need to figure out what type of resizes we want to support.
An added benefit to this ... is if we make sure all images are the same size ... we can batch. This requires re-sizing larger and re-sizing smaller to make it happen.
@JakeRadMSFT : You could create a generic image+keypoints
resize transform. This could then in future handle segmentation, key point detection (e.g. human pose detection), and various other ML tasks which include points which scale with the image.
Related work for co-handling the image+keypoints, is an image augmentation transform. To handle image rotation, keystoning, crop-and-zoom, skew, etc (list), the labeled points need to move with the image pixels.
One hard part is understanding the various label formats; another is properly handling box/segment/instance removal if a box/etc is partially or fully cropped out of out image.
Copying some notes from meeting.
Yangyu recommends to resize the image resolution to 800x600.
There are several points need to be careful:
Seems like we can do what we want to with the ResizeTransformer using IsoPad resize
and this code (based on the ISO Pad Resize code).
void CalculateAspectAndOffset(float sourceWidth, float sourceHeight, float destinationWidth, float destinationHeight, out float xOffset, out float yOffset, out float aspect)
{
float widthAspect = destinationWidth / sourceWidth;
float heightAspect = destinationHeight / sourceHeight;
xOffset = 0;
yOffset = 0;
if (heightAspect < widthAspect)
{
aspect = heightAspect;
xOffset = (destinationWidth - (sourceWidth * aspect)) / 2;
}
else
{
aspect = widthAspect;
yOffset = (destinationHeight - (sourceHeight * aspect)) / 2;
}
}
void ResizeKeyPoint(float sourceWidth, float sourceHeight, float destinationWidth, float destinationHeight, float sourceX, float sourceY, ref float destinationX, ref float destinationY)
{
float xOffset, yOffset, aspect;
CalculateAspectAndOffset(sourceWidth, sourceHeight, destinationWidth, destinationHeight, out xOffset, out yOffset, out aspect);
destinationX = xOffset + (sourceX * aspect);
destinationY = yOffset + (sourceY * aspect);
}
void ReverseResizeKeyPoint(float sourceWidth, float sourceHeight, float destinationWidth, float destinationHeight, float destinationX, float destinationY, ref float sourceX, ref float sourceY)
{
float xOffset, yOffset, aspect;
CalculateAspectAndOffset(sourceWidth, sourceHeight, destinationWidth, destinationHeight, out xOffset, out yOffset, out aspect);
sourceX = (destinationX - xOffset) / aspect;
sourceY = (destinationY - yOffset) / aspect;
}
xOffset, yOffset, and aspect can be re-used for all key points for the same image.
Thanks for the feedback @justinormont and @JakeRadMSFT for looking into it.
For resizing I'm okay to proceed with the recommended solution from Yangyu and Jake.
@michaelgsharp thoughts?
One additional recommendation here - if we know that 800x600 is the preferred size, let's make those the default values in the methods. Users can provide a different size if needed.
Copying some notes from meeting.
Yangyu recommends to resize the image resolution to 800x600.
There are several points need to be careful:
- Remember to resize image resolution and bounding box location/size at the same time in training dataset.
- It's better to keep the image resolution same or similar in both training and testing stage for one model instance (reduce resolution gap).
- It's better to keep the width-height ratio not change during resizing.
What @JakeRadMSFT said is very correct. For large images, resizing is necessary. The white padding part can be set to (r,g,b) = (0,0,0).
Repro:
Result:
Expected Result:
Workaround: