Jyouhou / UnrealText

Synthetic Scene Text from 3D Engines
MIT License
244 stars 39 forks source link

Drawing images instead of text #5

Closed Belval closed 4 years ago

Belval commented 4 years ago

From looking at the code, I was wondering how hard it would be to hijack the StickerTextActor to place small images instead of text, allowing for the generation of a dataset for label detection.

A bit more context, I am working on building a hazmat label detector (the diamond-shaped sign found on explosive/corrosive/infectious material containers) for a competition and it would be useful to have a lot of synthetic data to save the actual data (of which I do not have a lot) for fine-tuning.

From glancing at your code, unless I am mistaken you already load the text as PNG and then draw it with the engine, meaning that the text generation is on the Python side and I could simply load my own PNG (that wouldn't be text) and it would work?

Great project btw, can't wait for your presentation at CVPR :)

EDIT: Just to be clear, this is not a feature request, I just want to be sure that it is doable.

Jyouhou commented 4 years ago

Thank you for your interest and recognition! Though the CVPR is going totally virtual this year :( It's a pity.

Yes! You are correct about the design of the project. The StickerTextActor, on the engine's side, only implements functions to find a good location and load a PNG texture. The location of each text instances inside this texture, is on the Python side.

Actually, the current pipeline randomly places some static images instead of text images to increase the diversity of the dataset. See here:

https://github.com/Jyouhou/UnrealText/blob/db14e51facf235f4a0866911e512bf63078f1023/code/DataGenerator/TextPlacingModule.py#L153

So, to render label images, you can just adapt TextPlacingModule.py and WordImageGenerationModule.py to sample label images instead.

Note that, each text region or text image in this project, is actually a PNG texture that may contain multiple instances instead. So you will also need to provide the relative coordinates of each single instance inside this PNG texture, i.e. the CBOX and BBOX here:

https://github.com/Jyouhou/UnrealText/blob/db14e51facf235f4a0866911e512bf63078f1023/code/DataGenerator/WordImageGenerationModule.py#L352

In your case, you can regard each label as a single-character word.

Good luck with your competition!

Jyouhou commented 4 years ago

The good thing here is that, you can use the pre-compiled environments I released, and only need to modify some python code.

Belval commented 4 years ago

Thank you for your help, I was able to get some workable samples by editing RenderWordImage in WordImageGenerationModule.py, but the bounding boxes are wrong. What is the format I should be using? Here is what this function looks like now:

def RenderWordImage(self, Height, Width, WordColor=(255, 255, 255, 255),  SaveID=None):
        img_dir = "[PATH_TO_MY_HAZMAT_LABELS_DIR]"
        img_path = random.choice([os.path.join(img_dir, f) for f in os.listdir(img_dir)])

        img = resize_image(cv2.imread(img_path, cv2.IMREAD_UNCHANGED), (Height, Width))
        path = osp.join(self.ContentPath, f'word-{SaveID}.png')
        cv2.imwrite(path, img)
        H, W, _ = img.shape
        CBOX = np.array([])
        BBOX = np.array([[0, 0, H, W]])
        BBOX = np.reshape(np.stack([BBOX[:, 0], BBOX[:, 1], BBOX[:, 2], BBOX[:, 1], BBOX[:, 2], BBOX[:, 3], BBOX[:, 0], BBOX[:, 3]], axis=-1), newshape=[-1, 4, 2])
        return path, [], CBOX, BBOX, W, H

resize_image is just a image resizing function that keeps aspect ratio. Here is an example of the results:

0

Which is pretty satisfactory as a starting point, but the bounding boxes are all over the place (I don't think I understand the format that is used). Here are the bounding boxes visualized with vis.py:

2020-05-24-15:51:13-screenshot

The matching json label:

{
    "imgfile": "imgs/0.jpg",
    "bbox": [
        [
            597,
            505,
            -810,
            328,
            88,
            239,
            763,
            164
        ],
        [
            586,
            295,
            -5526,
            771,
            485,
            204,
            612,
            189
        ],
        [
            101,
            628,
            973,
            256,
            -167,
            232,
            119,
            210
        ],
        [
            966,
            516,
            -1409,
            -68,
            694,
            125,
            1010,
            120
        ],
        [
            303,
            398,
            2657,
            0,
            -5450,
            754,
            386,
            203
        ],
        [
            824,
            323,
            -909,
            255,
            814,
            168,
            873,
            163
        ],
        [
            1,
            172,
            228,
            224,
            289,
            438,
            614,
            1872
        ],
        [
            435,
            622,
            3331,
            189,
            161,
            182,
            486,
            152
        ]
    ],
    "cbox": [],
    "text": [],
    "is_difficult": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
    ]
}
Jyouhou commented 4 years ago

The image looks pretty cool to me.

I think the main problem is that, you didn't normalize the coordinates to [0, 1]. See line 346-349 in https://github.com/Jyouhou/UnrealText/blob/master/code/DataGenerator/WordImageGenerationModule.py

Both BBOX and CBOX have 3 dimensions: bbox[i ,j, :] is a 2-d vector, representing the x/y coordinates of the j-th vertex of the i-th box. I think you have gotten it correct. But they are all normalized to [0, 1].

Belval commented 4 years ago

Thank you for your help, the output looks pretty good, I'll go try with the other environments.

2020-05-25-11:34:52-screenshot