Closed Ahrovan closed 1 year ago
@Ahrovan Given the 'user/steering'->'user/angle' bug in Issue https://github.com/autorope/donkeycar/issues/1103 that was just fixed, can you checkout the latest main and rewrite your mycar folder (or whatever folder you use for the donkeycar application). So from the root of your donkeycar repo:
git checkout main
donkey createcar --overwrite --path=<path-to-your-mycar>
and give it another try.
PS: I gathered data, trained and ran autopilot using the new code. Remember that the steering data is labelled as 'user/angle' so if you changed that to 'user/steering' then you should change it back.
sim_warehouse_manual.tar.gz extracted again. not solved - failed
@Ezward We install all project from zero to make sure there is no problem
TensorFlow version 2.3.1 | donkey v4.4.dev6 | windows | conda 4.10.3 | Python 3.7.16 :: Anaconda, Inc.
myCar>donkey train --tub ./sim_warehouse_manual --model ./models/mypilot.h5
INFO:donkeycar.parts.interpreter:Convert model .........myNew\models\mypilot.h5 to TFLite ......myNew\models\mypilot.tflite
The problem is very simple. But the failure is constantly repeated. Please check to solve the problem as soon as possible
@Ahrovan Are you training a 'linear' model but running it as a 'categorical' model; that is what I see in the screen captures above. That would be a problem. If you want to run a categorical model then you need to explictly train it as such. If you want an linear model then be sure to choose linear when you run the autopilot and not categorical.
@Ezward no, both is linear. Problem just related to recording. I know but need to same
@Ezward @DocGarbanzo
Python 3.7.16 | donkey v4.4.dev6 | tensorflow '2.2.0' | windows | linear model | Created New Conda Env today
INFO:donkeycar.parts.interpreter:Convert model \workspace\mycar\models\mypilot.h5 to TFLite \workspace\mycar\models\mypilot.tfl
@Ahrovan that does look wrong. Can you try running tubplot with your model and data to see what it outputs; https://docs.donkeycar.com/utility/donkey/#plot-predictions
@Ahrovan Please also check your data again; I believe you renamed 'user/angle' to 'user/steering'; if you did you need to restore it to 'user/angle' and retrain.
sim_warehouse_manual.tar.gz extracted without any change.
'user/angle' exist.
@Ezward Today checked with branch 4.4.0 conda new Env, failed. same result.
tubplot output now
@Ezward The problem is related to train with gpu. If train operation is done with cpu, the problem will be solved.
@DocGarbanzo can you take a look at this?
@Ahrovan can you try to train with an earlier version of donkeycar so we can see if it is a recently introduced bug? Maybe checkout tag 4.3.6.2
or 4.4.0
Report - donkey v4.4.dev6 worked on Jetson NX - Trained with GPU
to summarize
Is that correct?
On the failing system can you train using the command line and copy the console output here?
This shows that the model has no problem. Only the ui display has a problem. I have tested this model on the robot. Robot runs without problems
Except that you also did a tubplot and it also showed constant throttle and steering, so it doesn't seem like it just a UI issue.
I don't understand what is happening in the prior comment; can you add an explanation?
Maybe you are saying one of the modes is always getting that bad record NaN error? And a white screen. Is that the same thing that was going on before? It looks different.
yes, white screen is related model trained with GPU always getting bad record NaN error. yes, i don't know
@DocGarbanzo have you seen this issue before in the DonkeyUI? Do you know what this might be?
Problem solved. cudnn 8.1 CUDA 11.2 TensorFlow 2.5 (CUDA >= 11) now I am working on compatibility issues on Host and Jetson platform. @Ezward
Great. I am glad you have a solution. Closing this issue.
using donkey v4.4.dev6 ...
I have not been able to train and run for several days. I doubted my data. Finally, I repeated the work of train with your data