LahiRumesh / SSL_Vision_Pipeline

Semi-Supervised | Pseudo Labeling pipeline with one-stage object detection models
8 stars 0 forks source link

Missing checkpoint weights and logs in the data_models dir #1

Open mirbehroznoor opened 2 years ago

mirbehroznoor commented 2 years ago

Thanks for the repository. I am missing the checkpoint weights and logs in the data_models. Also it could not get anything on Wandb.

The data_models dir structure after training:

data_models/
  |--process_data/
       |--image00.jpg
       |--annotations-export.csv
  |--class.names
  |--data_file_out.txt
  |--train.txt
  |--val.txt

There are no errors as such. Thanks

LahiRumesh commented 2 years ago

@mirbehroznoor The data model folder structure should be like this, All the logs and data will save in the data_models/YOUR_IMAGE_FOLDER_NAME/.. . Check your data dir and check the data CSV file.

data_models/
    |--Image_Folder_Name
             |--process_data/
                    |--image00.jpg
                    |--annotations-export.csv
             |--class.names
             |--data_file_out.txt
             |--train.txt
             |--val.txt
             |--log
             |--checkpoints

Please check this: https://lahrumesh28.medium.com/semi-supervised-learning-pseudo-labeling-custom-dataset-with-yolov4-53b896140894.

mirbehroznoor commented 2 years ago

Actually, I came from the medium article here. I could not get the folder structure, so I did the troubleshooting and noted the errors as mentioned below while running:

python train_models.py --model YOLOV4 --data_dir /home/data/Image_Folder  \
--weights yolov4.conv.137.pth \
--validation 0.1 --epochs 80 --batch_size 8

The installation of packages are from requirements.txt as mentioned in the article.


OS

Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:        22.04
Codename:       jammy

Errors on Local system and Google Colab:

Local System:

Conda Env

name: ssl
channels:
- conda-forge
dependencies:
- python=3.7 # onnxruntime doesnot support python=3.10
- pip
Python 3.7.12

First error from train_models.py

TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

Second error after pip install protobuf==3.12.0

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. 
Expected 88 from C header, got 80 from PyObject

Google Colab Env:

Python 3.7.14

train_models.py error Same as Second Error in Local System

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. 
Expected 88 from C header, got 80 from PyObject

Thanks