aws-samples / amazon-sagemaker-tensorflow-object-detection-api

Train and deploy models using TensorFlow 2 with the Object Detection API on Amazon SageMaker
MIT No Attribution
44 stars 34 forks source link

The class_id in data processing 1_prepare_data/docker/code/utils/tf_record_util.py doesn't match the label_map.pbtxt #6

Closed t-T-s closed 3 years ago

t-T-s commented 3 years ago

The class_idsaved in list of classes in line 69 of file _1_prepare_data/docker/code/utils/tf_recordutil.py is not matching the class_id of the label_map generation. Is this normal ?

Othmane796 commented 3 years ago

Hi @t-T-s ,

Thanks for using the repository. The line you're pointing to (code below), reads the class_id from the annotations file provided, which contains the bounding boxes and the class_id (which is 0 in our case)

        for a in annotations:
            ...
            class_id = a['class_id']
            ...
            classes.append(class_id)

In the data preparation notebook, we're defining the label_map as label_map = '{"0": "bee"}. I just double checked and the annotation file and it only contains the label class "0", which is then mapped to "bee" for better interpretation during inference for example. Let me know if I answered your question or If there is something I missed.

t-T-s commented 3 years ago

Thank you for your reply @Othmane796

That part is clear for me. But the code below from _1_prepare_data/docker/code/utils/tf_recordutil.py adds 1 to the class_id when generating lable_map.pbtxt.

print('GENERATING LABEL MAP FILE')
    with open(f'{output_folder}/label_map.pbtxt', 'w') as label_map_file:
        for item in label_map:
            label_map_file.write('item {\n')
            label_map_file.write(' id: ' + str(int(item) + 1) + '\n')
            label_map_file.write(" name: '" + label_map[item] + "'\n")
            label_map_file.write('}\n\n')

Can this be problematic during training ?

Othmane796 commented 3 years ago

Hi @t-T-s , Apologies for the late reply. You're right, this would typically be an issue, the reason why it was working is because during inference (3_predict/deploy_endpoint.ipynb) the label map that was used is the following:

category_index = {1:{'id': 1, 'name': 'bee'}}

Below if what would happen (right image) if we used a different category_index than the one used during training (example : category_index = {0:{'id': 0, 'name': 'bee'}})

Screenshot 2021-01-08 at 09 51 46

I will update the code accordingly to standardise the label_map. Thanks for pointing this out !