ECP-copa / Cabana

Performance-portable library for particle-based simulations
Other
193 stars 51 forks source link

Add CI for cuda build #74

Closed junghans closed 5 years ago

junghans commented 5 years ago

Thanks to @jgalarowicz

Fix #67

To Do:

codecov-io commented 5 years ago

Codecov Report

Merging #74 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master     #74   +/-   ##
======================================
  Coverage    69.1%   69.1%           
======================================
  Files          26      26           
  Lines        1831    1831           
======================================
  Hits         1267    1267           
  Misses        564     564
Flag Coverage Δ
#clang 84.7% <ø> (ø) :arrow_up:
#doxygen 19.8% <ø> (ø) :arrow_up:
#gcc 97.4% <ø> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4cf3902...ac48bcd. Read the comment docs.

junghans commented 5 years ago

Ready to merge @dalg24 please review!

junghans commented 5 years ago

@jgalarowicz, it works, but sometimes the tests fails with:

24/30 Test #24: Core_tutorial_04 .................***Exception: Child aborted  1.61 sec
terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaDeviceSynchronize() error( cudaErrorDevicesUnavailable): all CUDA-capable devices are busy or unavailable /root/hpc-gitlab-runner/ecpcitest/ecp-copa/cabana/builds/users/junghans/2e8de492/1/ecpcitest/ecp-copa/cabana/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp:119
Traceback functionality not available

which means we should run the test as batch job instead.

What tags: do need to use to submit this to the queue?

jgalarowicz commented 5 years ago

@junghans I believe the tag to submit a batch job to run on the compute nodes is "batch".

junghans commented 5 years ago

@jgalarowicz can you have a look why the last stage (test) is failing?

jgalarowicz commented 5 years ago

@junghans Yes, I will take a look!

junghans commented 5 years ago

Thanks, there is just no error message, which confuses me!

jgalarowicz commented 5 years ago

@junghans For some reason I can't log into ORNL. I opened a ticket. I will try this when I can login again.

jgalarowicz commented 5 years ago

@junghans My account at ORNL has been disabled. I think because my INCITE PEAC allocation was not renewed. I'm asking ORNL representatives for a sponsor.

junghans commented 5 years ago

I totally forgot about this PR!

@jgalarowicz it seems the permission issue is back: https://code.ornl.gov/ecpcitest/ecp-copa/cabana/pipelines/42620 can you have a look?

jgalarowicz commented 5 years ago

@junghans It seems like this might be the problem where each of the tests need to be a separate stage? I see that the code from 1ab6b95 that was the initial try on this. But, I don't see that code in the repository now. I remember you saying it wouldn't scale because of all the different variations that are required.

jgalarowicz commented 5 years ago

@junghans - I see the code now in the ci-cuda branch. So, maybe a different issue. Consulting with NMC - Paul and others.

sslattery commented 5 years ago

What is the status of this PR? Are we still seeing issues?

junghans commented 5 years ago

@jgalarowicz is doing final tweaks!

junghans commented 5 years ago

@jgalarowicz I added a workaround for the serialization bug.

junghans commented 5 years ago

This works now: https://code.ornl.gov/ecpcitest/ecp-copa/cabana/pipelines/45674

@sslattery @dalg24 please review and merge.

junghans commented 5 years ago

@sslattery squashed and rebased.