Experiment with More external dataset

HCA97 commented 11 months ago

https://discourse.aicrowd.com/t/external-datasets-used-by-participants/9217

HCA97 commented 11 months ago

I have doubts it will improve our score but still.

fkemeth commented 11 months ago

I helped them at least. I will try to create Kaggle datasets out of the two from OverWhelmingFit. I think it will help!

fkemeth commented 11 months ago

I annotated and uploaded the data to Kaggle. I shared it with you. Again, I use bboxes that are the size of the image.

I am training a CLIP model with both the challenge and the inat data now.

HCA97 commented 11 months ago

Hi,

We got a worse result, aaaaaaaaaa

I will try to quickly test this one https://discourse.aicrowd.com/t/external-dataset-notice-on-usage-declaration/8999/4?u=hca97

HCA97 commented 11 months ago

https://www.kaggle.com/datasets/haltun/lux-dataset-annotated/data

fkemeth commented 11 months ago

Hi,

We got a worse result, aaaaaaaaaa

I will try to quickly test this one https://discourse.aicrowd.com/t/external-dataset-notice-on-usage-declaration/8999/4?u=hca97

Yes, only 0.772 f1-score.

http://gitlab.aicrowd.com/hca97/mosquitoalert-2023-phase2-starter-kit/issues/136

Yes, maybe using the cleaned version helps! I will also try to do another training tonight!

HCA97 commented 11 months ago

@fkemeth I am having some errors with the submission

Hangs in like this

./submit.sh yolo-v8-s-classic-vit-l-14-ema-lux-dataset-7
Making submission as "hca97"
Checking git remote settings...
Using gitlab.aicrowd.com/hca97/mosquitoalert-2023-phase2-starter-kit as the submission repository
Updated Git hooks.
Git LFS initialized.
On branch hca
nothing to commit, working tree clean

or

git push --set-upstream origin hca
Uploading LFS objects: 100% (21/21), 34 GB | 4.9 MB/s, done.                                                                    
client_loop: send disconnect: Broken pipe
send-pack: unexpected disconnect while reading sideband packet
Enumerating objects: 24, done.
Counting objects: 100% (24/24), done.
Delta compression using up to 16 threads
Compressing objects: 100% (18/18), done.
fatal: the remote end hung up unexpectedly

HCA97 commented 11 months ago

nothing to commit, working tree clean fatal: unable to access 'https://gitlab.aicrowd.com/hca97/mosquitoalert-2023-phase2-starter-kit.git/': The requested URL returned error: 504

fkemeth commented 11 months ago

Hi @HCA97 ,

fatal: unable to access 'https://gitlab.aicrowd.com/hca97/mosquitoalert-2023-phase2-starter-kit.git/': The requested URL returned error: 504

I had the same error. I needed to create a new SSH key pair, then it worked again. The other two errors I did not observe. I thought it was due to my key pair being expired - but maybe there was some different issue/changes.

HCA97 commented 11 months ago

@fkemeth I have generated a new SSH key but I am still having the same error. I can pull or commit my changes but I cannot submit them.

HCA97 commented 11 months ago

if you can able to submit could you submit my solution it is in hca branch

HCA97 commented 11 months ago

Okay, I managed to resolve the issue, it seems like there is something wrong with submit.sh,

We can either create a new tag in the UI or by git. But tag must start with submission-

fkemeth commented 11 months ago

Did you manage to resolve it?

I never used the submit.sh. I always did something like git tag -am "submission-abc" submission-abc git push origin submission-abc which triggered the submission (I commited and pushed the changes to master before that).

HCA97 commented 11 months ago

ooo, I always used the submission script. Yes, it is resolved.

fkemeth commented 11 months ago

In the inference script, we have image_cropped = image[bbox[1] : bbox[3], bbox[0] : bbox[2], :]

I am not sure about the Yolo format, but shouldn't it be image_cropped = image[bbox[0] : bbox[2], bbox[1] : bbox[3], :]

as with [xmin, ymin, xmax, ymax]?

HCA97 commented 11 months ago

Finally, we increased our score (I think the dataset from Lux is pretty good), but it takes more than 3 hours to train!

http://gitlab.aicrowd.com/hca97/mosquitoalert-2023-phase2-starter-kit/issues/137

Unfortunately, I forgot to set the warm_up_steps so we used 2000 instead of 1000 but I doubt switching to 1000 will improve our score. (training code https://github.com/HCA97/Mosquito-Classifiction/blob/yolo_and_more_testing/mosquito_clf_yolo_lux_ema.py)

n_classes: 6
model_name: ViT-L-14
dataset: datacomp_xl_s13b_b90k
freeze_backbones: false
head_version: 7
warm_up_steps: 2000
bs: 16
data_aug: hca
loss_func: ce
epochs: 15
label_smoothing: 0.1
hd_lr: 0.0003
hd_wd: 1.0e-05
img_size:
- 224
- 224
use_ema: true
use_same_split_as_yolo: false
shift_box: false
max_steps: 60000

HCA97 commented 11 months ago

wow, what a nice thing is I am getting the same score with the local evaluation!

I think using YOLO annotations is better than using challenge annotations. Because during the inference we use YOLO annotations.

fkemeth commented 11 months ago

wow, 0.856 f1 score. What did you change?

HCA97 commented 11 months ago

I uploaded the YOLO annotations in the Kaggle dataset:

They are from our first YOLO model, ideally, I want to use the baseline model (provided one) to reduce our complexity.

Btw bounding box location is in float don't forget to cast to int

fkemeth commented 11 months ago

Finally, we increased our score (I think the dataset from Lux is pretty good), but it takes more than 3 hours to train!

http://gitlab.aicrowd.com/hca97/mosquitoalert-2023-phase2-starter-kit/issues/137

Unfortunately, I forgot to set the warm_up_steps so we used 2000 instead of 1000 but I doubt switching to 1000 will improve our score. (training code https://github.com/HCA97/Mosquito-Classifiction/blob/yolo_and_more_testing/mosquito_clf_mosq_lux_ema.py)
n_classes: 6
model_name: ViT-L-14
dataset: datacomp_xl_s13b_b90k
freeze_backbones: false
head_version: 7
warm_up_steps: 2000
bs: 16
data_aug: hca
loss_func: ce
epochs: 15
label_smoothing: 0.1
hd_lr: 0.0003
hd_wd: 1.0e-05
img_size:
- 224
- 224
use_ema: true
use_same_split_as_yolo: false
shift_box: false
max_steps: 60000

Would you mind sharing the tensorboard loss curves as well? maybe we see how we should change hyperparameters from there.

HCA97 commented 11 months ago

In the inference script, we have image_cropped = image[bbox[1] : bbox[3], bbox[0] : bbox[2], :]

I am not sure about the Yolo format, but shouldn't it be image_cropped = image[bbox[0] : bbox[2], bbox[1] : bbox[3], :]

as with [xmin, ymin, xmax, ymax]?

I think x and y are in cartesian coordinate, so in rows in the image is the y-axis, and columns in the image is the x-axis.

HCA97 commented 11 months ago

Would you mind sharing the tensorboard loss curves as well? maybe we see how we should change hyperparameters from there.

Here: https://drive.google.com/drive/folders/1wd3FNpi8a3KRMVFykWp-U6Xr017apAl6?usp=sharing

HCA97 commented 11 months ago

I feel like our best solution very similar with the Luxs solution

fkemeth commented 11 months ago

I feel like our best solution very similar with the Luxs solution

Yes, he has the same score. Do you know which data the others use in addition? Lux shared the cleaned inat data right? With the uncleaned version I got not so good results. Maybe the others cleaned their respective data as well.

HCA97 commented 11 months ago

Probably, I don't know how it can work with noisy data.

HCA97 commented 11 months ago

@fkemeth bad news I break my ubuntu :) i suspect thanks to nividia drivers! they somehow updated. and the bad thing is i cannot enter bios or boot in recovery mode. idk why i cannot enter them.

HCA97 commented 11 months ago

i had an experiment idea train the model using different data augmentation either happy whale or imagenet then if they performm decent more than .83 then do weight ensembly like https://github.com/HCA97/Mosquito-Classifiction/issues/4#issuecomment-1676440367

fkemeth commented 11 months ago

I trained with the lux data yesterday, but got only an f1-score of 0.8. Did you train with both folders in Lux? Did you do upsampling still? If so, only for the challenge data?

On Kaggle it takes a while to train the models, I fear I won't be able to train three different ones. Are you able to train on Kaggle still? Then we could split that up.

I was also wondering if it would make sense to use the model from you and fine tune the last layer towards f1 score. What do you think? Do you know where I can find it?

I hope you can get your Linux fixed! I am feeling with you, I also had issues with updates and Nvidia drivers a while ago, but for me bios still worked.

HCA97 commented 11 months ago

Hmm, what are your parameters?

Finally, we increased our score (I think the dataset from Lux is pretty good), but it takes more than 3 hours to train!

http://gitlab.aicrowd.com/hca97/mosquitoalert-2023-phase2-starter-kit/issues/137

Unfortunately, I forgot to set the warm_up_steps so we used 2000 instead of 1000 but I doubt switching to 1000 will improve our score. (training code https://github.com/HCA97/Mosquito-Classifiction/blob/yolo_and_more_testing/mosquito_clf_yolo_lux_ema.py)
n_classes: 6
model_name: ViT-L-14
dataset: datacomp_xl_s13b_b90k
freeze_backbones: false
head_version: 7
warm_up_steps: 2000
bs: 16
data_aug: hca
loss_func: ce
epochs: 15
label_smoothing: 0.1
hd_lr: 0.0003
hd_wd: 1.0e-05
img_size:
- 224
- 224
use_ema: true
use_same_split_as_yolo: false
shift_box: false
max_steps: 60000

I beside from our defaullt parameters (except warm_up steps) used Exponantial Moving Average aswell

This is my validation data:

Validation: https://www.kaggle.com/datasets/haltun/mosquito-data-round-2?select=best_model_val_data_yolo_annotations.csv

The training data is merge of all the Lux's data (both folders) and the training data:

Training: https://www.kaggle.com/datasets/haltun/mosquito-data-round-2?select=best_model_train_data_yolo_annotations.csv

Did you do upsampling still? If so, only for the challenge data?

After I merge them I do sampling. So for both datasets.

Note that both of the validation and training bounding box annotations from YOLO not from challenge

CLIP model: https://gitlab.aicrowd.com/hca97/mosquitoalert-2023-phase2-starter-kit/-/blob/921be0e22fc63178c531f6d73067d926e5ff5b69/my_models/clip_weights/epoch=11-val_loss=0.7277861833572388-val_f1_score=0.8598743081092834-val_multiclass_accuracy=0.8671819567680359.ckpt

YOLO model: https://gitlab.aicrowd.com/hca97/mosquitoalert-2023-phase2-starter-kit/-/blob/921be0e22fc63178c531f6d73067d926e5ff5b69/my_models/yolo_model_weights/best-yolov8-s-classic.pt

HCA97 commented 11 months ago

I was also wondering if it would make sense to use the model from you and fine tune the last layer towards f1 score. What do you think? Do you know where I can find it?

Make sense maybe fine tune the last layer with nosier bounding box annoations.

HCA97 commented 11 months ago

On Kaggle it takes a while to train the models, I fear I won't be able to train three different ones. Are you able to train on Kaggle still? Then we could split that up

I think no time now, I can start look into it after 4-5 PM CET, I don't think that will be enough time. Anyway current result is still good as long as we didn't overfit to Public LB we are good.

HCA97 commented 11 months ago

I hope you can get your Linux fixed! I am feeling with you, I also had issues with updates and Nvidia drivers a while ago, but for me bios still worked.

Unfortunately, Linux always have problem with drivers. So annoying to deal with. But I guess nothing we can do.

fkemeth commented 11 months ago

Hi @HCA97 ,

I did one epoch of head fine-tuning with the ce+f1 loss. The resulting model did not change much though,

epoch=0-val_loss=1.2470133304595947-val_f1_score=0.8503904938697815-val_multiclass_accuracy=0.8767501711845398.ckpt

Will try to train another model layer with larger learning rate, less data augmentation tonight

This is the notebook https://www.kaggle.com/code/fkemeth/pho-experimentation/edit/run/147309771

HCA97 commented 11 months ago

Good news, I think I fixed my PC. it was a kernel issue (needed to upgrade the kernel) that affected the Nvidia driver. And apparently, my keyboard doesn't detect during the bootup. I used an old keyboard of mine and managed to boot it in recovery mode.

I will run experiments with different data augmentation tonight and try to merge their results tomorrow.

fkemeth commented 11 months ago

Good to hear it is working again!

I trained a few more models, one got a bit better score on the val data

val_f1_score=0.8680818676948547

I will submit it now to see if holds on the challenge data as well.

HCA97 commented 11 months ago

:crossed_fingers:

HCA97 commented 11 months ago

Good morning, My experiments failed because after a while I got a black screen and all the training stopped. I will look into those issues now.

Good to hear it is working again!

I trained a few more models, one got a bit better score on the val data

val_f1_score=0.8680818676948547

I will submit it now to see if holds on the challenge data as well.

Could you use the https://gitlab.aicrowd.com/hca97/mosquitoalert-2023-phase2-starter-kit/-/blob/921be0e22fc63178c531f6d73067d926e5ff5b69/my_models/yolo_model_weights/best-yolov8-s-classic.pt YOLO model when you submit your solution.

fkemeth commented 11 months ago

Yes, I updated my code! I should have merged your PR on gitlab!

fkemeth commented 11 months ago

How is it going with your experiments?

HCA97 commented 11 months ago

bad i still have problem with my system. I updated the nvidia driver from 525 to 530 issue still persists, i think i need to re install ubuntu, so I stop experimenting :)

fkemeth commented 11 months ago

Good luck to it! I

I also could not get a better model.

HCA97 commented 11 months ago

anyway i think we did good a job

fkemeth commented 11 months ago

Yes, it was actually more fun than I expected. Thanks for reaching out!

I hope you learned as much as I did. Let me know if at any time you want to team up again, maybe we can build on the code/knowledge we got in this challenge.

fkemeth commented 11 months ago

Considering the time we could spend on it, fifth place is great in my opinion.

HCA97 commented 11 months ago

Hi @fkemeth,

Yes, it was actually more fun than I expected. Thanks for reaching out!

Thank you for partnering with me as well.

I hope you learned as much as I did. Let me know if at any time you want to team up again, maybe we can build on the code/knowledge we got in this challenge.

I also learned new things, and I believe some of them will be really useful in the future:

Balancing classes by oversampling
Utilizing YOLO annotations instead of training data
EMA (Exponential Moving Average) + Label Smoothing
INAT dataset
Kaggle Notebooks
Owl-ViT

Considering the time we could spend on it, fifth place is great in my opinion.

I agree. We didn't have powerful hardware or a lot of free time. If I didn't have some problems with my system I think we could have gotten a few more points higher.

HCA97 commented 10 months ago

I will close this issue. Using Lux's dataset improved our score significantly.

HCA97 / Mosquito-Classifiction

Experiment with More external dataset #28