autorope / donkeycar

Open source hardware and software platform to build a small scale self driving car.
http://www.donkeycar.com
MIT License
3.05k stars 1.28k forks source link

Colab training is not accelerated because tensorflow 2.9 is forced to be installed with the [pc] option #1159

Open sctse999 opened 5 months ago

sctse999 commented 5 months ago

In the latest release 5.0.0, tensorflow==2.9 is added to https://github.com/autorope/donkeycar/blob/d60dcb5e0627465851873f35e80d3b3863973fc6/setup.cfg#L69

When the command !pip3 install -e .[pc] is ran on a colab GPU instance, it will automatically uninstall the GPU-capable tensorflow and install tensorflow 2.9.

We can see that before 5.0.0 (e.g. v5.0.dev3 ), the output of the training is like this:

2024-01-23 11:53:05.172074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13949 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5
INFO:donkeycar.parts.keras:Created KerasLinear with interpreter: KerasInterpreter

whereas in 5.0.0, the output of the training is like this:

2024-01-23 11:56:07.005500: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

I understand that including tensorflow in the pc option would facilitate a smoother installation experience. How about if we add one more option named colab which would be the same as pc but without tensorflow? It would allow users to train models on colab without editing the setup.cfg manually.

sctse999 commented 4 months ago

@Ezward @DocGarbanzo do you have any advice?