Kaggle / docker-python

Kaggle Python docker image
Apache License 2.0
2.38k stars 927 forks source link

Upgrade TensorFlow newer version on both GPU and TPU. #1114

Closed innat closed 1 year ago

innat commented 2 years ago

RE-POST on Kaggle.


I'm not sure if this is the right place to ask, if it's not then please redirect.

πŸš€ Feature

Here are the two requests

  1. Upgrade TensorFlow Version same as the Colab.
  2. Upgrade TensorFlow Version for both GPU and TPU

Motivation

About (1), while doing an experiment on the kaggle environment, it's known that users usually switch between kaggle and colab. Now, when TensorFlow teams release a newer version of TensorFlow, it immediately upgrades Colab but the process on Kaggle is too slow. And that creates some problems on a mismatch with the newer feature, mostly with tensorflow.experimental.* with tensorflow.<something_stable> for example. So, by ensuring the newer version on the Kaggle environment along with the Colab, it surely gonna be great to the end sure.

About (2), same as the no. 1; currently kaggle upgrades TensorFlow v. 2.6 for GPU but for TPU, it uses 2.4.1. The newer version provides more features. The problem is, if a user uses those new features on GPU (tf v. 2.6), he/she can't use them on TPU (tf v.2.4) for the comparatively older tensorflow versions.

Additional context

I understand it may be complicated on that side to maintain. But please consider the above issues in the best way possible.

rosbo commented 2 years ago

Hi @innat,

We have internal blockers preventing us to upgrade the NVIDIA driver which is required to upgrade to TensorFlow 1.7 + CUDA 11.2+.

Hopefully, these blockers should be resolved soon which will enable us to upgrade TensorFlow.

Thank you

innat commented 2 years ago

@rosbo Thanks for the response. Please note, it's not about TensorFlow version 2.7. The issues are described in detail above. In short,

darien-schettler commented 1 year ago

@Philmod - If you want this to be the main thread for grief related to the environments that's fine... but in that case, it probably shouldn't have been abandoned for 3 months.

Either reopen the other issue or start making progress on this one. Please.

Philmod commented 1 year ago

Hi, we are working on this, but it's not as easy as it looks.

This image depends on a base image that doesn't support yet CUDA 11.4, which is necessary to upgrade some packages. The fact that Kaggle image supports many packages makes some upgrades very complicated.

darien-schettler commented 1 year ago

What's the timeline for completion? As I previously mentioned, the current CUDA version is 2 years old. Many other libraries are 2-5 major releases behind.

I get that this is hard. That being said, this is a primary value-add to this product. If you want Kaggle to continue being an open-source community where sharing code is important, then we need to be able to share and execute code. Without that, the competitions may become more often solved by scripts or inference-only notebooks and people will do everything on their locals (or on Colab).

All communication on previous timelines was clearly wrong and no one ever followed up. If this is going to continue to be difficult or take a long time, it might make sense to be completely transparent and educate everyone on a timeline to when these issues will be solved. Then, if/when things go awry (for they surely will), you keep everyone up to date and adjust accordingly.

I'm not trying to be difficult, but I recommend MANY people to Kaggle as a platform and almost exclusively use it as a home-base for learning and teaching. The complaints I'm raising are because I've reached a boil-over point (and I believe others have as well) that proves how unacceptable the updates and communication have been. At this point, if the lack of support, communication and updates continues, I'll probably have to stop using Kaggle for many of the things I used to and push others to platforms that better update/communicate. I really would prefer to keep sharing and building things though as I obviously love the Kaggle platform.

So please do better, communicate better, and make this a priority.

ps: I know Google is fractured internally and probably pretty siloed, but the fact that Colab can remain far more up-to-date than Kaggle is pretty confusing. Can you not get support from them on this?

Philmod commented 1 year ago

Colab environment doesn't support thousands of packages, that's why they don't have this issue.

darien-schettler commented 1 year ago

Ok. I think I have gotten the relevant communication. I'm bowing out of this issue and the others and it can go back to being about the incoming update to TF for GPU and TPU.


My take from the communication is that the official line from those involved is that due to the number of packages in the environment (many more than Colab) it is incredibly difficult to update packages and the respective massive environment.

As a result, updates cannot be completed as frequently as other platforms like Colab.

It appears there are no real solutions in the works to solve the underlying cause at this time (at least none being communicated)

PS: One such solution I suggested was adding more potential images to choose from – one like Colab (a 'slim' ENV) for instance that could be more up-to-date while still providing the more bloated out-of-date image

While this communication is not what I had hoped as the outcome, I am sure that those using building the platform are doing their best. I thank them for their efforts.

I will endeavour to find a stable alternative in the long-term (build my own rig) and will default to using Colab or Google Cloud in the short-term.

Thanks again to those involved and thanks to @Philmod for providing communication on this topic.

djherbis commented 1 year ago

Closing this since we have upgraded tensorflow to the latest versions we can.