Tirth27 / Detecting-diabetic-retinopathy

Deep learning applied to Kaggle's Diabetic retinopathy dataset.
https://diabetic-retinopathy-detection.herokuapp.com/
MIT License
35 stars 23 forks source link

About the accuracy #6

Closed YUN-XIAO-MO closed 2 years ago

YUN-XIAO-MO commented 2 years ago

Dear Sir, I'm sorry to disturb you again.

I've spent the last few days trying to train neural networks and found the following problem.

the problem is the accuracy rate always oscillates around 0.50.

I examined some of the commonly used methods to improve accuracy and found that gregwchase did apply them.

I usually only use about 1 to 2 gigabytes of data in these exercises, so one of my guesses is the reason for the lack of data.

I'd like to see what you think about that.

Another question, I would like to ask the same .npy file as gregwchase, can it apply your model.

Tirth27 commented 2 years ago

Hi @YUN-XIAO-MO , Great to hear that you made a progress.

Yes, because the data is skewed with most of the images being No DR, leads to less accuracy. If you want to get better understanding of what other people has approached, you can goto the Kaggle Competition and see the discussion to get better understanding on approaches.

I you wanted to use my code then you did not required .npy file. It will directly take image from the folder and do the augmentation on fly using Keras Train Generator. https://github.com/Tirth27/Detecting-diabetic-retinopathy/blob/86ba4fe616e15f72f509f1ed17a5b2dae8c84b88/src/Model/model.py#L223

Example of Keras Generator

YUN-XIAO-MO commented 2 years ago

Hello Sir, I encountered this problem when I tried your model。

Found 0 images belonging to 0 classes

I looked up other people's solutions.:↓

Because keras uses the directory tree to classify and label, the directory tree should be changed to this standard form.

============================================

some/path/

class1/

image1.jpeg image1.jpeg image1.jpeg

class2/

image2.jpeg image2.jpeg image2.jpeg image2.jpeg

=============================================

Does this mean THAT I need to categorize the dataset according to its category?

Tirth27 commented 2 years ago

Yes you have to organise images based on the categories. Something like this.

data/
    train/
        dogs/
            dog001.jpg
            dog002.jpg
            ...
        cats/
            cat001.jpg
            cat002.jpg
            ...
    validation/
        dogs/
            dog001.jpg
            dog002.jpg
            ...
        cats/
            cat001.jpg
            cat002.jpg
            ...

You can search online and will find many good examples of what it actually takes as input. https://towardsdatascience.com/keras-data-generators-and-how-to-use-them-b69129ed779c

YUN-XIAO-MO commented 2 years ago

GOOD NEWS,SIR!

The model ran successfully.

But its running situation is still a bit weried.

it did not finish the whole epochs.

Found 78300 images belonging to 5 classes.
Found 17577 images belonging to 5 classes.
Epoch 1/200
   1/7118 [..............................] - ETA: 8:17:40 - loss: 0.1483 - accuracy: 0.3636
   2/7118 [..............................] - ETA: 48:08 - loss: 0.1848 - accuracy: 0.2273 
   7/7118 [..............................] - ETA: 15:08 - loss: 0.1701 - accuracy: 0.3247
WARNING:tensorflow:Callback method `on_train_batch_begin` is slow compared to the batch time (batch time: 0.0209s vs `on_train_batch_begin` time: 0.0577s). Check your callbacks.
7118/7118 [==============================] - 266s 37ms/step - loss: 0.1233 - accuracy: 0.5764 - val_loss: 0.1052 - val_accuracy: 0.6545
Found 17577 images belonging to 5 classes.
7118/7118 [==============================] - 19s 3ms/step - loss: 0.1052 - accuracy: 0.6545
Test Loss:  0.10516579449176788
Test Accuracy:  0.654491662979126
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 7118 batches). You may need to use the repeat() function when building your dataset.
17577/17577 [==============================] - 52s 3ms/step
Precision:  0.6544916652443534
Recall:  0.6544916652443534
Saving Model.
Model Score is Low, Score:- 0.6544916652443534
F1:  0.6544916652443534
Cohen Kappa Score:  0.35903690794542875
Quadratic Kappa Score:  0.4299640128884423
Confusion Matrix Folder Created!
Completed

the step should be obtained by calculation.

STEP_SIZE_TRAIN = train_generator.n // train_generator.batch_size
STEP_SIZE_VALID = validation_generator.n // validation_generator.batch_size

I don't understand why there is a lack of data, there 78000+ images in the directory, the warning shouldn't happen。

Tirth27 commented 2 years ago

Great! You can train increasing the batch size to avoid gettingWARNING:tensorflow:Callback method on_train_batch_begin is slow compared to the batch time.

Also, I suggest to first try vanilla model (i.e. without any major tweaks and changes) mostly with default values. And then later make changes as needed. In that scenario you can identify which changes causing the issue.

ymous337 commented 2 years ago

GOOD NEWS,SIR!

The model ran successfully.

But its running situation is still a bit weried.

it did not finish the whole epochs.

Found 78300 images belonging to 5 classes.
Found 17577 images belonging to 5 classes.
Epoch 1/200
   1/7118 [..............................] - ETA: 8:17:40 - loss: 0.1483 - accuracy: 0.3636
   2/7118 [..............................] - ETA: 48:08 - loss: 0.1848 - accuracy: 0.2273 
   7/7118 [..............................] - ETA: 15:08 - loss: 0.1701 - accuracy: 0.3247
WARNING:tensorflow:Callback method `on_train_batch_begin` is slow compared to the batch time (batch time: 0.0209s vs `on_train_batch_begin` time: 0.0577s). Check your callbacks.
7118/7118 [==============================] - 266s 37ms/step - loss: 0.1233 - accuracy: 0.5764 - val_loss: 0.1052 - val_accuracy: 0.6545
Found 17577 images belonging to 5 classes.
7118/7118 [==============================] - 19s 3ms/step - loss: 0.1052 - accuracy: 0.6545
Test Loss:  0.10516579449176788
Test Accuracy:  0.654491662979126
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 7118 batches). You may need to use the repeat() function when building your dataset.
17577/17577 [==============================] - 52s 3ms/step
Precision:  0.6544916652443534
Recall:  0.6544916652443534
Saving Model.
Model Score is Low, Score:- 0.6544916652443534
F1:  0.6544916652443534
Cohen Kappa Score:  0.35903690794542875
Quadratic Kappa Score:  0.4299640128884423
Confusion Matrix Folder Created!
Completed

the step should be obtained by calculation.

STEP_SIZE_TRAIN = train_generator.n // train_generator.batch_size
STEP_SIZE_VALID = validation_generator.n // validation_generator.batch_size

I don't understand why there is a lack of data, there 78000+ images in the directory, the warning shouldn't happen。

Can you please share the solution to Found 0 images belonging to 0 classes issue. Thanks in advance.

Tirth27 commented 2 years ago

So it should be something related to folder configuration. https://datascience.stackexchange.com/questions/51671/keras-flow-from-directory-returns-0-images