Closed dscha09 closed 5 years ago
@MaybeShewill-CV Hmmm... you mean train_lanenet.py
? I was trying to run it line by line.. but I get a different error which is not present when I run it in bash.
Update:
I changed the train and test size to both 1. Same with val batch size, and I successfully started the training.
Okay. I have to follow your suggestion and to read the code again, until the end, so I can get a sense of what is going on.
@chaine09 Since the training process works I will close this issue:)
@MaybeShewill-CV Okay! But before you close this, please tell me if you only trained with those 5 images or not?
@chaine09 Yep it works well with only five image.
Hi @MaybeShewill-CV, I'm already training the model, currently at the 2000th epoch. But looking at the contents of /model/tusimple_lanenet
folder, it only contains two checkpoints namely tusimple_lanenet_vgg_2018-10-19-13-33-56.ckpt-200000
which was the saved model from your Dropbox, and
tusimple_lanenet_vgg_2018-11-02-19-13-33.ckpt-0
which was created during retraining the model.
Is it correct that for tusimple_lanenet_vgg_2018-10-19-13-33-56.ckpt-200000
, "200000" is the epoch number?
I'm already on the 2000th epoch and there is still no checkpoint other than tusimple_lanenet_vgg_2018-11-02-19-13-33.ckpt-0
.... I wanted to do testing for the model trained up to the current epoch, while training is still ongoing.
Should I let the entire training finish first? Before expecting a more current saved checkpoint? Or should I abort the current training?
I tried `tusimple_lanenet_vgg_2018-11-02-19-13-33.ckpt-2000
but it says that the file name doesn't exist
ValueError: The passed save_path is not a valid checkpoint: /Users/cvsanbuenaventura/Documents/lanenet-lane-detection-master/model/tusimple_lanenet/tusimple_lanenet_vgg_2018-11-02-19-13-33.ckpt-2000
@chaine09 I wonder if you have used tensor flow before? Finish your training process and pay a little patience ok?
@MaybeShewill-CV Oh, because it's possible to save your checkpoint every epoch. I was wondering if you did that in your code?
@MaybeShewill-CV Nevermind my question, I see you saved the checkpoint every 2000 epochs :)
Hi @MaybeShewill-CV, i have one last question. How did you generate the binary and instance images for the training data?
@chaine09 You can follow the tusimple dataset readme file. The training samples can be generated based on their guidence:)
Hi @MaybeShewill-CV, I also want to ask about some odd behavior I've noticed during retraining your model (using the original training dataset you provided).
I noticed that the accuracy dropped to 0 on the 9th epoch, and started to slowly rise again on the 30th.
Then on beyond 2000 epochs, the accuracy is almost 100%. The accuracy reached a steady value of 100% for epochs greater than 4000.
I just replicated your original data of 5 images. Then proceeded with training 17 images (replicated).
Although the training accuracy is 100%, the output generated for epochs greater than 2000 is not accurate.
For the 2000th epoch, I only got one lane line. Then for epochs greater than 4000, I only get a black image (even though the training accuracy is 100%).
Should I just continue with the training?
@chaine09 With only five images you will get nothing
@MaybeShewill-CV What do you mean by get nothing?
What I did is copied and pasted the 5 images 4 times.. That's why I have a total of almost 20 images. is this sufficient?
@chaine09 I suggest you to train the model on the whole tusimple dataset.
@MaybeShewill-CV May I ask, for the saved model you provided in Dropbox, how many images did you train it with?
@chaine09 The model was trained with the whole tusimple lane dataset
@MaybeShewill-CV And you really trained it for 200010 epochs?
@chaine09 。。。。。。 yes
@MaybeShewill-CV Can you tell me the format your code is accepting for both binary and instance segmentation files?
i was successful in testing the trained model by getting the trained weights you uploaded in Dropbox. However, I want to retrain the model on new training data.
I added one new image for the existing training data of five images following the instructions in the repo and added new images in the
image,
gt_image_instance
, andgt_image_binary
folders, but i get errors. I enter this line from your repo in bash:python tools/train_lanenet.py --net vgg --dataset_dir data/training_data_example/
The errors I get are:
cv2.error: OpenCV(3.4.2) /Users/travis/build/skvark/opencv-python/opencv/modules/imgproc/src/resize.cpp:4044: error: (-215:Assertion failed) !ssize.empty() in function 'resize'
and sometimes i get this error:
I already modified the train.txt and val.txt and changed the file paths for the images found locally on my machine.
How to fix this?