flatironinstitute / cufinufft

Nonuniform fast Fourier transforms of types 1 and 2, in 1D, 2D, and 3D, on the GPU
Other
83 stars 18 forks source link

Run make check in Jenkins #125

Closed elliottslaughter closed 2 years ago

elliottslaughter commented 2 years ago

I noticed that make check is not being run in Jenkins. I'm adding it because I suspect there are failures in the test suite that aren't being caught at the moment.

ahbarnett commented 2 years ago

@janden @garrettwrong Dear Joakim & Garrett, do either of you have a few minutes to look into what's happening here? Since we're starting to get a bunch of users, it would be great to have CI? (of course, various devs like Melody, Johannes, and occasionally me, are regularly checking it locally..) It fails at the Py import stage, ie it's an env issue not a code issue:

    from pycuda._driver import *  # noqa
E   SystemError: initialization of _driver raised unreported exception

I know nothing about the jenkins GPU setup... Thanks, Alex

janden commented 2 years ago

So it looks like there are two issues here: – CI is failing currently due to NumPy version mismatches. This should be handled by #129. – We are currently only running the Python tests, not the C++ tests.

Regarding the issue of the C++ tests, I think that our idea was that the Python tests could cover most of the code, so any bugs would trip them. That being said, I don't know that that is achieved, so it might make sense to include them. @ahbarnett, what do you think?

If we do want to include them, we have to see how that might best be achieved. It could be here in the Jenkinsfile or it could be in the Dockerfile where the library is being built (and the other calls to make are performed).

ahbarnett commented 2 years ago

Actually I have a question about this for @janden: does the Jenkins test (now using python3 -m pip install -e) actually compile libcufinufft.so locally, or are this precompiled and downloaded from somewhere else?

It the .so is locally compiled, then I think the py tests are good enough for CI. The tests in make check are more extensive, less supportable, and let's keep them for devs only.

But @elliottslaughter, do you suspect particular make check items of not working on the Jenkins system? Maybe now since commit 46a8e8a fixes the environment, you can upstream fetch that to your branch anyway.

elliottslaughter commented 2 years ago

I think I was confused about the make check because I expected an issue like https://github.com/flatironinstitute/cufinufft/pull/116#issuecomment-1030468068 to break the build, but it doesn't. (I suspect it may be a GCC vs Clang issue, so the default CUDA setup with GCC doesn't catch it.)

Anyway, this is your project so feel free to do as you see fit, but my own two cents would be that the more tests, the better. Even if the Python and C++ tests overlap, make check is at least testing a different language interface. The CI is the perfect place to put this since you don't need to remember to run it, and it helps sanity check that everything stays working.

I rebased and the run passes, but maybe I messed the Jenkins syntax because I can't actually see the make check running anywhere in the job output.

ahbarnett commented 2 years ago

Thanks Elliott for finding the MAX_NF bug, and we will aim to add make check to CI as you suggest. Best, Alex

On Sat, Feb 5, 2022 at 1:51 AM Elliott Slaughter @.***> wrote:

I think I was confused about the make check because I expected an issue like #116 (comment) https://github.com/flatironinstitute/cufinufft/pull/116#issuecomment-1030468068 to break the build, but it doesn't. (I suspect it may be a GCC vs Clang issue, so the default CUDA setup with GCC doesn't catch it.)

Anyway, this is your project so feel free to do as you see fit, but my own two cents would be that the more tests, the better. Even if the Python and C++ tests overlap, make check is at least testing a different language interface. The CI is the perfect place to put this since you don't need to remember to run it, and it helps sanity check that everything stays working.

I rebased and the run passes, but maybe I messed the Jenkins syntax because I can't actually see the make check running anywhere in the job output.

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/cufinufft/pull/125#issuecomment-1030563064, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSUCHPHRQ4ADXSNKHQLUZTCILANCNFSM5NB3KEXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

-- *---------------------------------------------------------------------~^`^~._.~' |\ Alex H. Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

elliottslaughter commented 2 years ago

Is it possible that you're running a centrally-controlled Jenkinsfile, rather than the one in the repo? I tried another version of my change, and it literally isn't taking the script I'm putting in. I'm starting to suspect it's not actually running the Jenkinsfile out of the repository at all but using a centrally managed version, or something along those lines.

ahbarnett commented 2 years ago

I'm pinging @janden on this, since he set it up. Best, Alex

On Wed, Feb 9, 2022 at 1:41 AM Elliott Slaughter @.***> wrote:

Is it possible that you're running a centrally-controlled Jenkinsfile, rather than the one in the repo? I tried another version of my change, and it literally isn't taking the script I'm putting in. I'm starting to suspect it's not actually running the Jenkinsfile out of the repository at all but using a centrally managed version, or something along those lines.

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/cufinufft/pull/125#issuecomment-1033403171, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSQXZHZ2SSQN7QQRRP3U2IEANANCNFSM5NB3KEXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

-- *---------------------------------------------------------------------~^`^~._.~' |\ Alex H. Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

janden commented 2 years ago

Actually I have a question about this for @janden: does the Jenkins test (now using python3 -m pip install -e) actually compile libcufinufft.so locally, or are this precompiled and downloaded from somewhere else?

It's compiled as part of the Dockerfile each time the CI is run.

Anyway, this is your project so feel free to do as you see fit, but my own two cents would be that the more tests, the better. Even if the Python and C++ tests overlap, make check is at least testing a different language interface. The CI is the perfect place to put this since you don't need to remember to run it, and it helps sanity check that everything stays working.

I agree.

Is it possible that you're running a centrally-controlled Jenkinsfile, rather than the one in the repo?

I'm not sure. It could be that it runs the one in the master branch (for safety reasons). That being said, I'm fairly certain that the Dockerfile is being updated from the branch, so you may have better luck putting in the make check line there. I also think it makes more sense to put it there since that's where the other calls to make are.

elliottslaughter commented 2 years ago

Good news: running make check in the Dockerfile does indeed cause it to run.

Bad news: it is now hitting:

../bin/spread1d_test: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory

I'm not really familiar with how these containers are set up, so I'm not sure where this library would be located?

janden commented 2 years ago

I'm not really familiar with how these containers are set up, so I'm not sure where this library would be located?

This line https://github.com/flatironinstitute/cufinufft/blob/bb1453dfe9dc12159e8e346eae79ad4d71fd566f/ci/docker/cuda10.1/Dockerfile-x86_64#L72 sets the necessary environment variable. Try invoking make check after it.

elliottslaughter commented 2 years ago

@janden I believe the most recent change I made matches what you suggested, but the Jenkins build still fails.

janden commented 2 years ago

Looks like the LD_LIBRARY_PATH is still incorrect. Let me try something.

janden commented 2 years ago

So I think I know what's happening. The way the docker image is set up, it doesn't actually have access to the GPU during the build phase. At this point, all that's happening is that the library is being compiled, so it doesn't actually need the GPU. Later, when we're testing, we do use the GPU, which is why Jenkins runs the docker image with the --gpus 1 flag set (otherwise we'd see the same crash during pytest).

It may be possible to give the docker image access to the GPU during the build phase, but this seems complicated. I therefore suggest you move the make check invocation back into the Jenkinsfile. Then we get back to the original problem that Jenkins won't run it until it's in master, but we'll just have to merge it in and see what happens.

elliottslaughter commented 2 years ago

Ok, I have reset this PR to its original form, adding the make check line to Jenkinsfile.

However, I really think you should check the Jenkins configuration first. As you can see at the link below, one of the options to configure the contents of a Jenkinsfile is through the admin UI. If this is how the instance is currently set up, it literally won't matter what is in the master branch (or anything else, aside from the UI).

https://www.jenkins.io/doc/book/pipeline/getting-started/#through-the-classic-ui

You'll want to double check that you've followed these instructions to make sure you're pulling the Jenkinsfile from the repository itself:

https://www.jenkins.io/doc/book/pipeline/getting-started/#defining-a-pipeline-in-scm

I don't see anything in there that would suggest that it only pays attention to certain branches (which is why I'm suggesting double checking that it's not configured via the UI).

janden commented 2 years ago

I don't have admin access to Jenkins, so there is no good way for me to verify or modify these settings.

It looks like Jenkins doesn't trust the Jenkinsfile in this PR (see https://jenkins.flatironinstitute.org/job/cufinufft/indexing/events), but it does trust the one from master. It might trust it if the PR comes from this repository. Will give it a try. Otherwise it won't hurt to just merge it.

janden commented 2 years ago

Ok. Looks like my hypothesis was correct.

I suggest we close this PR, undraft #134, and continue there. Since it's a branch on the main repository, Jenkins trusts the Jenkinsfile, so we can make any changes we want there and it should show up in the CI.

elliottslaughter commented 2 years ago

Looks like make check ran in master. Thanks all!