dionhaefner / blog-comments

Repository holding utterances comments to my blog
0 stars 0 forks source link

2021/12/supercharged-high-resolution-ocean-simulation-with-jax/ #4

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

Supercharged high-resolution ocean simulation with JAX | dionhaefner.github.io

Our Python ocean model Veros (which I maintain) now fully supports JAX as its computational backend. As a result, Veros has much better performance than before on both CPU and GPU, while all model code is still written in Python. In fact, we can now do high-resolution ocean simulations on …

https://dionhaefner.github.io/2021/12/supercharged-high-resolution-ocean-simulation-with-jax/

shahmoradi commented 2 years ago

Greetings, I came across your paper "Fast, Cheap, and Turbulent—Global Ocean Modeling With GPU Acceleration in Python" which is being discussed in Fortran Discourse (https://fortran-lang.discourse.group/t/global-ocean-modeling-with-gpu-acceleration-in-python/2497/5). I wanted to bring your attention to a few shortcomings in the comparison of Fortran GPU computing with Python. For example, to parallelize the Fortran code in Figure 2 of your paper, the Fortran code does not need ANY external directives. Just convert the loops to a do concurrent to get CPU or GPU parallelism depending on the choice of hardware. Here is a relevant article (https://developer.nvidia.com/blog/accelerating-fortran-do-concurrent-with-gpus-and-the-nvidia-hpc-sdk/). This paper does not seem to offer a fair unbiased comparison of Fortran with Python. It is unfortunate that reviewers do not bring up such obvious shortcomings to authors in the review process. I have to say that I have not read your paper in full. so there may be more shortcomings in this paper regarding Fortran or it may even turn out that this is all my misunderstanding, in which case, please do correct me here. sincerely,

dionhaefner commented 2 years ago

Hey Amir,

I'm glad to hear that you are discussing our article. I have the highest respect for the efforts to modernize Fortran (such as LFortran) and don't mean to discredit anyone's work. That being said, I stand by my words in the article.

I wasn't aware of GPU support via do concurrent through the NVIDIA HPC SDK, and while it looks promising in principle, it doesn't seem like a serious contender for something like JAX / XLA. For example, there is this limitation:

By changing the DO loop to DO CONCURRENT, you are telling the compiler that there are no data dependencies between the n loop iterations. This leaves the compiler free to generate instructions that the iterations can be executed in any order and simultaneously. The compiler parallelizes the loop even if there are data dependencies, resulting in race conditions and likely incorrect results. It’s your responsibility to ensure that the loop is safe to be parallelized.

That makes it currently unusable for anything but trivial point-wise operations, so there would be no way to write an entire ocean model with it (which often has data dependencies like cumulative sums).

We had 4 anonymous reviewers on the paper, and I had the impression that they were well versed with traditional (Fortran) modelling, but even they had to concede that GPU programming is currently a blind spot of the Fortran ecosystem. If you need more evidence for this: There are hundreds of people working on earth system models in Fortran, surely they would run on GPU by now if it were easy to do :)

I'll chime in on the Discourse thread, happy to discuss this more over there.