cogsys-tuebingen / deephs_fruit

Measuring the ripeness of fruit with Hyperspectral Imaging and Deep Learning
40 stars 11 forks source link

Why are there not enough labels in the train_all_v2.json file? #14

Closed Pongsan2540 closed 2 months ago

Pongsan2540 commented 2 months ago

I loaded the dataset file name: Mango.zip and I am creating labels (file name : train_allv2.json) for Mango data, but I found that the number of data records is greater than the specified number of labels. I checked the id in the records section and matched it with the id record_id in the annotations section, found that many records could not find matching labels.

For example: {"id": 3191, "fruit": "Mango", "side": "front", "day": "day_1_m3", "camera_type": "VIS", "files": {"header_file": "Mango/VIS/day_1_m3/mango_day_1_m3_01_front.hdr", "data_file": "Mango/VIS/day_1_m3/mango_day_1_m3_01_front.bin"}} I can't find the labels of this data json.

So I'm wondering if something went wrong.

leonvarga commented 2 months ago

I am not sure, whether I can follow your question fully.

'train_all_v2.json' contains all available files. Also, the files of the test and validation set, but these without a label.

If you are only interested in the labeled trainings set files, you can use 'train_only_labeled_v2.json' or just ignore the entries without label.

Pongsan2540 commented 2 months ago

@leonvarga

I am facing the problem of not having enough labels in the file, just train_all_v2.json.

I would like to ask what could be the cause of the problem of incomplete labels? I've tried checking it out. I see that we have a total of 5307 data records, but the number of labels is only 636.

I feel sorry for the 4,000 pieces of information that are unusable So I wonder what the cause is.

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

fruit | side | camera_type | count | Check math -- | -- | -- | -- | -- Avocado | front | NIR | 159 | 18 Avocado | back | NIR | 163 | 22 Avocado | front | VIS | 466 | 46 Avocado | back | VIS | 466 | 46 Avocado | front | VIS_COR | 132 | 0 Avocado | back | VIS_COR | 132 | 0 Kiwi | front | NIR | 180 | 29 Kiwi | back | NIR | 180 | 29 Kiwi | front | VIS | 551 | 72 Kiwi | back | VIS | 545 | 66 Kiwi | front | VIS_COR | 0 | 0 Kiwi | back | VIS_COR | 0 | 0 Kaki | front | NIR | 0 | 0 Kaki | back | NIR | 0 | 0 Kaki | front | VIS | 182 | 24 Kaki | back | VIS | 189 | 32 Kaki | front | VIS_COR | 186 | 27 Kaki | back | VIS_COR | 188 | 29 Mango | front | NIR | 0 | 0 Mango | back | NIR | 0 | 0 Mango | front | VIS | 268 | 27 Mango | back | VIS | 270 | 29 Mango | front | VIS_COR | 268 | 27 Mango | back | VIS_COR | 270 | 29 Papaya | front | NIR | 0 | 0 Papaya | back | NIR | 0 | 0 Papaya | front | VIS | 130 | 23 Papaya | back | VIS | 126 | 19 Papaya | front | VIS_COR | 130 | 23 Papaya | back | VIS_COR | 126 | 19   |   | SUM | 5307 | 636

check_train_all_v2.xlsx train_all_v2.json

leonvarga commented 2 months ago

As mentioned in the corresponding paper, not all recordings were labeled. The measurement is a destructive measurement and therefore only a subset was labeled.

The unlabeled recordings can be used for unsupervised techniques like self-supervised learning.

If you are interested only in the labeled recordings, use the file: train_only_labeled_v2.json

Pongsan2540 commented 2 months ago

Thank you very much. Your answer helped me a lot.