Closed Belval closed 4 years ago
Thank you for your interest and recognition! Though the CVPR is going totally virtual this year :( It's a pity.
Yes! You are correct about the design of the project. The StickerTextActor, on the engine's side, only implements functions to find a good location and load a PNG texture. The location of each text instances inside this texture, is on the Python side.
Actually, the current pipeline randomly places some static images instead of text images to increase the diversity of the dataset. See here:
So, to render label images, you can just adapt TextPlacingModule.py
and WordImageGenerationModule.py
to sample label images instead.
Note that, each text region or text image in this project, is actually a PNG texture that may contain multiple instances instead. So you will also need to provide the relative coordinates of each single instance inside this PNG texture, i.e. the CBOX and BBOX here:
In your case, you can regard each label as a single-character word.
Good luck with your competition!
The good thing here is that, you can use the pre-compiled environments I released, and only need to modify some python code.
Thank you for your help, I was able to get some workable samples by editing RenderWordImage
in WordImageGenerationModule.py
, but the bounding boxes are wrong. What is the format I should be using? Here is what this function looks like now:
def RenderWordImage(self, Height, Width, WordColor=(255, 255, 255, 255), SaveID=None):
img_dir = "[PATH_TO_MY_HAZMAT_LABELS_DIR]"
img_path = random.choice([os.path.join(img_dir, f) for f in os.listdir(img_dir)])
img = resize_image(cv2.imread(img_path, cv2.IMREAD_UNCHANGED), (Height, Width))
path = osp.join(self.ContentPath, f'word-{SaveID}.png')
cv2.imwrite(path, img)
H, W, _ = img.shape
CBOX = np.array([])
BBOX = np.array([[0, 0, H, W]])
BBOX = np.reshape(np.stack([BBOX[:, 0], BBOX[:, 1], BBOX[:, 2], BBOX[:, 1], BBOX[:, 2], BBOX[:, 3], BBOX[:, 0], BBOX[:, 3]], axis=-1), newshape=[-1, 4, 2])
return path, [], CBOX, BBOX, W, H
resize_image
is just a image resizing function that keeps aspect ratio. Here is an example of the results:
Which is pretty satisfactory as a starting point, but the bounding boxes are all over the place (I don't think I understand the format that is used). Here are the bounding boxes visualized with vis.py
:
The matching json label:
{
"imgfile": "imgs/0.jpg",
"bbox": [
[
597,
505,
-810,
328,
88,
239,
763,
164
],
[
586,
295,
-5526,
771,
485,
204,
612,
189
],
[
101,
628,
973,
256,
-167,
232,
119,
210
],
[
966,
516,
-1409,
-68,
694,
125,
1010,
120
],
[
303,
398,
2657,
0,
-5450,
754,
386,
203
],
[
824,
323,
-909,
255,
814,
168,
873,
163
],
[
1,
172,
228,
224,
289,
438,
614,
1872
],
[
435,
622,
3331,
189,
161,
182,
486,
152
]
],
"cbox": [],
"text": [],
"is_difficult": [
0,
0,
0,
0,
0,
0,
0,
0
]
}
The image looks pretty cool to me.
I think the main problem is that, you didn't normalize the coordinates to [0, 1]. See line 346-349 in https://github.com/Jyouhou/UnrealText/blob/master/code/DataGenerator/WordImageGenerationModule.py
Both BBOX and CBOX have 3 dimensions: bbox[i ,j, :] is a 2-d vector, representing the x/y coordinates of the j-th vertex of the i-th box. I think you have gotten it correct. But they are all normalized to [0, 1].
Thank you for your help, the output looks pretty good, I'll go try with the other environments.
From looking at the code, I was wondering how hard it would be to hijack the
StickerTextActor
to place small images instead of text, allowing for the generation of a dataset for label detection.A bit more context, I am working on building a hazmat label detector (the diamond-shaped sign found on explosive/corrosive/infectious material containers) for a competition and it would be useful to have a lot of synthetic data to save the actual data (of which I do not have a lot) for fine-tuning.
From glancing at your code, unless I am mistaken you already load the text as PNG and then draw it with the engine, meaning that the text generation is on the Python side and I could simply load my own PNG (that wouldn't be text) and it would work?
Great project btw, can't wait for your presentation at CVPR :)
EDIT: Just to be clear, this is not a feature request, I just want to be sure that it is doable.