ecmwf / atlas

A library for numerical weather prediction and climate modelling
https://sites.ecmwf.int/docs/atlas
Apache License 2.0
112 stars 42 forks source link

Adjoint of a halo-exchange #13

Closed MarekWlasak closed 4 years ago

MarekWlasak commented 4 years ago

Hi. I think that this is something that I could tackle, initially for structured data. Is that o.k @wdeconinck @ytremolet ? Please tell me either way. I won't be offended if you say no. If the answer is yes, then tell me how long I have got.

ytremolet commented 4 years ago

That's fine with me if you want to have a go at this.

ytremolet commented 4 years ago

Oh, and you have one hour! :-)

ytremolet commented 4 years ago

Really, I don't have a good idea yet of how long that could take.

wdeconinck commented 4 years ago

HI Marek, You're welcome to give that a go. There's no concept of structured data in the HaloExchange class though. The basic procedure for current halo-exchange is:

It may be a little confusing because the packing and unpacking of the send- and recv- buffers has 2 implementations depending on availability of CUDA. You can ignore the CUDA implementation, and we can worry about that later.

MarekWlasak commented 4 years ago

Thanks @wdeconinck @ytremolet. Realistically, I will be able to start on Monday as there a few thing other things I need to tie up first.

MarekWlasak commented 4 years ago

Hi. I think it would be easier for me to work initially from the JCSDA forked version of atlas. @wdeconinck - is that o.k?

wdeconinck commented 4 years ago

Yes. It is not allowed to create branches in the ecmwf/atlas repo, only create branches in a fork and submit a PR. This fork could be any fork, such as the jcsda fork.

MarekWlasak commented 4 years ago

Thanks!

MarekWlasak commented 4 years ago

@ytremolet @wdeconinck - Just an update ... I have got it to work (without CUDA) and am halfway through building the tests ... I should have a PR ready sometime tomorrow. What are the governance rules for PRs? I presume that someone from ECMWF will need to be requested to review for each PR. Or is there going to be a two step process, with an internal JCSDA review followed by a ECMWF review (when JCSDA atlas repo gets updated / merged / realigned with the main ECMWF repo?

wdeconinck commented 4 years ago

Any PR should be submitted to the ecmwf repo, and branching off the atlas develop branch. I will then assign reviewers. Currently that would be myself. Make sure to have a unit-test that tests your functionality. I will do an internal test to see if the code works across all platforms we use, including Cray XC40.

MarekWlasak commented 4 years ago

I have a bit a problem ... I worked off the develop-jcsda branch of the JCSDA atlas fork, because it was significantly more up to date than the forked atlas develop. Comparing my change with the atlas develop shows up Benjamin's changes to PointCloud class. I don't think the JCSDA's forked version of atlas develop can be updated to the latest version without updates to other repos, @benjaminmenetrier am I right? Do I need to wait until JCSDA switches to using develop-jcsda? I am not sure whether I can run the ecmwf's develop on JCSDA's estate.

Also I don't know where to get the appropriate version of gridtools so that I can check whether I have coded that part correctly.

ytremolet commented 4 years ago

@MarekWlasak At this point you have two options:

It's up to you which route you prefer (unless Willem has other comments), but in any case, the PR should be on the ecmwf repo.

@wdeconinck This is an example where I could help with the review if we could add me as a reviewer. This is really for DA applications so I like to keep on top of what is available to us and I have been writing adjoints for many years which can be useful (although Marek has that experience as well).

wdeconinck commented 4 years ago

Hi @MarekWlasak If you create the PR, I can try myself to get the gridtools-backend working, as long as you have a unit-test I can use. When you do create the PR, you can click a checkbox that allows me to edit your branch directly. As @ytremolet said, the develop-jcsda branch is of temporary nature, and is currently broken. It should not be considered.

MarekWlasak commented 4 years ago

Hi @wdeconinck. Thanks for the offer on running the gridtools-backend. I have almost finished the unit-tests. There will be twice the number that you have for the normal halo exchange and they are taking longer to write than the original code.

I started my code branching from JCSDA's develop-jcsda branch. Once I finish these tests, I plan to copy the code into a new branch that is based around the latest ecmwf's atlas develop. I am not sure whether I will be able to run my tests with it though. Do I need the latest versions of eckit and fckit?

MarekWlasak commented 4 years ago

I have completed the adjoint tests, barring the ones involving gridtools. I could create a temporary PR with the develop-jcsda branch so that you can see what I have been up to. Would that be of use while I look and see while I can get a branch based on ecmwf atlas develop version to run?

MarekWlasak commented 4 years ago

Wow! That was not as hard as I expected it to be. #16 is the associated PR.