Problem With Generating Dataset

hosseinzadeh88 commented 3 years ago

@ak-nv Thanks for the new update. The update has fixed the problem with widerfaces dataset not being written to the disk. However, the following issues still remain:

The total number of the "Mask" labels is now 5050 and still not 6000.
The total number of the "No-Mask" labels is now 5809 and still not 6000.
The box pixel coordinates that are saved in the label files have floating points now, instead of being an int. I'm concerned about how tfrecords is going to handle the floating-point bounding boxes during the dataset conversion.
Given the number of "Mask" labels not being 6000, it indicates that there is potentially something wrong with either kagge2kitti.py or mafa2kitti.py files as well!
Could you please change the data2kitti.py to print the factual and actual number of the labels rather than just saying 6000 (when it is not really 6000).

ak-nv commented 3 years ago

The total number of the "Mask" labels is now 5050 and still not 6000. The total number of the "No-Mask" labels is now 5809 and still not 6000

What do you mean here? Can you show me your log, something like; what I have here

Given the number of "Mask" labels not being 6000, it indicates that there is potentially something wrong with either kagge2kitti.py or mafa2kitti.py files as well!

Feel free to submit a pull request for this and I would be happy to integrate.

Could you please change the data2kitti.py to print the factual and actual number of the labels rather than just saying 6000 (when it is not really 6000).

Number of images can be less than Number of mask and no-mask labels as one image can contain multiple faces.

hosseinzadeh88 commented 3 years ago

The total number of the "Mask" labels is now 5050 and still not 6000. The total number of the "No-Mask" labels is now 5809 and still not 6000

What do you mean here? Can you show me your log, something like; what I have here

Given the number of "Mask" labels not being 6000, it indicates that there is potentially something wrong with either kagge2kitti.py or mafa2kitti.py files as well!

Feel free to submit a pull request for this and I would be happy to integrate.

Could you please change the data2kitti.py to print the factual and actual number of the labels rather than just saying 6000 (when it is not really 6000).

Number of images can be less than Number of mask and no-mask labels as one image can contain multiple faces.

Please write a simple script to analyze the labels generated by the data2kitti.py and count the number of the maks labels and no-maks labels. Before your yesterday update and changing the widerfaces2kitti.py this was the outcome of analyzing the labels:

Total of 3533 files found in the labels directory.
Total "MASK" labels: 5050
Total "No-Mask" labels: 3676
Total undefined labels: 0

none of the data from the widerfaces were getting converted! There were only 3000 mask labels and NOT 6000, and I'm not talking about the files, I know that each image or label file has more than one sample face potentially. Probably most of these people not getting detections and results is because they don't have the correct dataset at all! I've spent a few days trying to train the network while I didn't have the correct dataset. The data2kitti.py just counts the number of the labels in each class whether they are successfully converted or not are is not checked at all, worse than that it is very confidently reporting the number of labels for each category as 6000, giving the impression that the data2kitti,py has worked successfully, while it has not!

Please try this script yourself and analyze all the labels generated by the data2kitti.py and see if there are indeed 6000 mask labels and 6000 no-mask labels.

import glob

maskedCounter = 0
noMaskCounter = 0
undefCounter = 0

datasetDir = '/home/sal/Desktop/Data4Training/train' # Main directory that contains two folders named "images" and the "labels"

fileNames = glob.glob(datasetDir+'/labels/'+'*.txt')
print(f'Total of {len(fileNames)} files found in the labels directory.')
for fileName in fileNames:
    name = fileName.split('/')[-1]
    name = name.split('.')[0]
    # read the image and the labels
    labelFile = open(datasetDir + '/labels/' + name + '.txt')
    lines = labelFile.readlines()
    for line in lines:
        classLabel = line.split()[0]
        #extract the faces from the images and write them to a seperate file.
        if classLabel == 'No-Mask':
            noMaskCounter = noMaskCounter + 1
        elif classLabel == 'Mask':
            maskedCounter = maskedCounter + 1
        else:
            undefCounter = undefCounter + 1
print(f'Total "MASK" labels: {maskedCounter}\nTotal "No-Mask" labels: {noMaskCounter} \nTotal undefined labels: {undefCounter}')

After your yesterday update the numbers got slightly better. now there are 5050 masked labels instead of 6000 and 5809 non-masked labels. However, the gap between expected and generated labels increases when increasing the category-limit to 7000. There are 8000 masked faces in Kaggle and MAFA all together, about 2000 of which are not being processed by the data2kitti.py file!

Anyways, thanks for your amazing work, but I guess having this logged as an issue is quite necessary for people to realize they are starting on a wrong path if they don't have the correct dataset, and they shouldn't be expecting to get the result reported by the authors. I analyzed the Kaggle and MAFA and there are just a little more than 8000 masked faces, so now I have a dataset of 8000 mask and 8000 no-mask. If you don't mind, may I share that on Google Drive and link it here?

Thanks @ak-nv

kaushikCanada commented 3 years ago

@hosseinzadeh88 Please share your google drive here. I am also having the same trouble .

hosseinzadeh88 commented 3 years ago

@kaushikCanada Please try this https://drive.google.com/drive/folders/1WYOlif0kFyo-AJtp2-UydL1whbvxHKzr?usp=sharing This should have just over 8000 instances of mask and no-mask. To what I remember you don't need to process the images or the labels, these are preprocessed. Just provide the path of the labels and images to the "tfrecors". Please be aware that:

There should not be ANY warnings or errors when tfrecords are being generated.
There should be just over 8000 labels for each class.

Otherwise, I have shared the wrong version of the dataset. (I worked on this more than a month ago had to search to see if I can find the data and I believe I have shared the correct one).

Let me know how you are getting on.

kaushikCanada commented 3 years ago

Thank you so much for such a quick reply. Let me try with this. Maybe I will bug you with a few questions. This is so awesome.

On Wed, Nov 18, 2020 at 1:56 PM Salaheddin Hosseinzadeh < notifications@github.com> wrote:

@kaushikCanada https://github.com/kaushikCanada Please try this https://drive.google.com/drive/folders/1WYOlif0kFyo-AJtp2-UydL1whbvxHKzr?usp=sharing This should have just over 8000 instances of mask and no-mask. To what I remember you don't need to process the images or the labels, these are preprocessed. Just provide the path of the labels and images to the "tfrecors". Please be aware that:

There should not be ANY warnings or errors when tfrecords are being generated.

There should be just over 8000 labels for each class.

Otherwise, I have shared the wrong version of the dataset. (I worked on this more than a month ago had to search to see if I can find the data and I believe I have shared the correct one).

Let me know how you are getting on.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA-AI-IOT/face-mask-detection/issues/14#issuecomment-729886427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFCDQCJ4D2OTKUQRJJN22TDSQQKHLANCNFSM4SAMKPXQ .

kaushikCanada commented 3 years ago

Ok, i now i should just run the data2kitty.py with the given oaths? I got the widerface dataset. How do i chose 6k images out of that?

On Wed, Nov 18, 2020, 2:16 PM Kaushik Roy notifications@github.com wrote:

Thank you so much for such a quick reply. Let me try with this. Maybe I will bug you with a few questions. This is so awesome.

On Wed, Nov 18, 2020 at 1:56 PM Salaheddin Hosseinzadeh < notifications@github.com> wrote:

@kaushikCanada https://github.com/kaushikCanada Please try this

https://drive.google.com/drive/folders/1WYOlif0kFyo-AJtp2-UydL1whbvxHKzr?usp=sharing This should have just over 8000 instances of mask and no-mask. To what I remember you don't need to process the images or the labels, these are preprocessed. Just provide the path of the labels and images to the "tfrecors". Please be aware that:

There should not be ANY warnings or errors when tfrecords are being generated.

There should be just over 8000 labels for each class.

Otherwise, I have shared the wrong version of the dataset. (I worked on this more than a month ago had to search to see if I can find the data and I believe I have shared the correct one).

Let me know how you are getting on.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/NVIDIA-AI-IOT/face-mask-detection/issues/14#issuecomment-729886427 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AFCDQCJ4D2OTKUQRJJN22TDSQQKHLANCNFSM4SAMKPXQ

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NVIDIA-AI-IOT/face-mask-detection/issues/14#issuecomment-729896804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFCDQCLC4GYCKTWYU2YZMDDSQQMRDANCNFSM4SAMKPXQ .

hosseinzadeh88 commented 3 years ago

Ok, i now i should just run the data2kitty.py with the given oaths? I got the widerface dataset. How do i chose 6k images out of that? … On Wed, Nov 18, 2020, 2:16 PM Kaushik Roy @.> wrote: Thank you so much for such a quick reply. Let me try with this. Maybe I will bug you with a few questions. This is so awesome. On Wed, Nov 18, 2020 at 1:56 PM Salaheddin Hosseinzadeh < @.> wrote: > @kaushikCanada https://github.com/kaushikCanada > Please try this > https://drive.google.com/drive/folders/1WYOlif0kFyo-AJtp2-UydL1whbvxHKzr?usp=sharing > This should have just over 8000 instances of mask and no-mask. To what I > remember you don't need to process the images or the labels, these are > preprocessed. Just provide the path of the labels and images to the > "tfrecors". > Please be aware that: > > 1. There should not be ANY warnings or errors when tfrecords are being > generated. > 2. There should be just over 8000 labels for each class. > > Otherwise, I have shared the wrong version of the dataset. (I worked on > this more than a month ago had to search to see if I can find the data and > I believe I have shared the correct one). > > Let me know how you are getting on. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < #14 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AFCDQCJ4D2OTKUQRJJN22TDSQQKHLANCNFSM4SAMKPXQ > > . > — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFCDQCLC4GYCKTWYU2YZMDDSQQMRDANCNFSM4SAMKPXQ .

@kaushikCanada As I mentioned, this dataset is already processed. Meaning that they are (must be to what I remember) in KITTI format. So you do NOT need to run data2kitti.py. Start from running the Jupyter notebook, your data is (must be) prepared. Just unzip and point the tfrecords to that folder. Make sure inside the folder you have 2 subfolders called images and labels. Hint: to do that you have to modify the text file that you pass to the tfrecords, if you open the text file you will see the path, make sure it points to the unzipped folder.

NVIDIA-AI-IOT / face-mask-detection

Problem With Generating Dataset #14