GEOS-DEV / thirdPartyLibs

Repository to build the GEOSX third party libraries
3 stars 12 forks source link

Action with Pangea-3 installation reproduction and ppc64le emulation #257

Closed Algiane closed 2 months ago

Algiane commented 10 months ago

New job that:

Linked to EPIC TTE Builds and Geos PR 3159

Algiane commented 9 months ago

Preliminary Remarks

Job Failure

Evaluation of emulation layer slowdown

  1. qemu layer: slowdown by a factor about 14 The slowdown linked to the call of the qemu-user-static emulation layer is evaluated by comparing the compilation time of the finitElement library of Geos repository on 4 cores with 32GB of memory:

    • without qemu: real 2m11.743s - user 4m22.582s
    • with qemu: real 27m58.993s - user 73m23.212s
  2. uraimo/run-on-arch-action: slowdown by a factor about 15 (no particular degradation comparing to qemu layer) The slowdown linked to the call of the run-on-arch-action has been evaluated on an external code without any dependencies (for sake of simplicity as it removes the need to construct a suitable docker image for target architecture). Test results are available here:

    • without run-on-arch emu: real 1m20 - user 1m13
    • with run-on-arch emu: real 20m32.581s - user 20m3.864s

Perspectives

Even if we succeed to build TPLs in the suitable time, as GEOS Cuda build is a lot slower (~73m on 4 cores in Debug mode and 100m in Release one), it will not be possible to use the emulation layer as it.

Nevertheless we can list some improvement paths for the current PR and the TPL build:

Algiane commented 9 months ago

@sframba @TotoGaz : you can read the PR comments if interested by a feedback on this work that you initiated with @XL64 .

TotoGaz commented 9 months ago

Hello @Algiane thank you for your comments.

The timing issue is surely something to keep in mind, but before getting to this, I'd like to get a little more information about the process.

Algiane commented 8 months ago

Hi @TotoGaz ,

On pangea III, I can run the acous3D_abc_smoke.xml test case with the geos binary I built. I don't know how to monitor that it really uses the GPUs but running the same test on the P3_USERS_CPU queue fails with the no CUDA-capable device error.

The geos TPLs and geos binary have been built:

For now, the test of the executable on P3 is tweaked. I:

Please let me know if you need more tests.

Best

TotoGaz commented 8 months ago

On pangea III, I can run the acous3D_abc_smoke.xml test case with the geos binary I built. I don't know how to monitor that it really uses the GPUs but running the same test on the P3_USERS_CPU queue fails with the no CUDA-capable device error.

For that specific purpose, you can run geos with the --trace-data-migration, Trace host-device data migration command line option. You'll be able to see data moving from and to the device.

TotoGaz commented 8 months ago

@Algiane Is it fair to state that now the issue is really a timing issue? That if we had a very very powerful machine, that would work OK?

Cross compiling is something that can be very challenging. Furthermore, cross compiling the TPLs means cross compiling ~20 libs with their sometimes clunky build systems. And you add CUDA on top of that. I do not know how to manage that, that would require a lot of dedication, to say the least.

Algiane commented 8 months ago

Thanks for the --trace-data-migration tip: it confirms that some LvArrays are moved on/from GPUs.

For me, with this method we have 2 issues:

For now, as the emulation seems to be a dead-end but we still don't have a solution to test the P3 configuration, I will let this PR as a draft and try to see if we can connect a ppc64 runner to the github-actions as a self-hosted runner: it can be an alternative way if we can buy a small ppc64 machine.

Best

TotoGaz commented 8 months ago
    1. the compilation time;

We have a powerful self-hosted machine. Do you think that could do it?

    1. the size of the docker images: the image with the pre-built TPLs is very close to the 10 GB limit and I think that the base image (the image with the copy of the pangea modules that are needed to build the TPLs but without the TPLs built) is not very far. The cuda module alone is already more than 4 GB. Finally, it was not possible to work directly inside the containers and I had to mount my home to avoid the no space left on device error.

I'm surprised that this gets so big. E.g. https://hub.docker.com/r/geosx/pecan-gpu-gcc8.2.0-openmpi4.0.1-mkl2019.5-cuda11.5.119/tags is ~4.4GB (still very big, but half). Do you know what get's it so big? We're using a lot the multi-stage approach to remove the temporaries. Are you doing the same?

Also, if we manage to run it on a comfortable self-hosted machine, would the size issue become secondary?

Algiane commented 8 months ago
    1. the compilation time;

We have a powerful self-hosted machine. Do you think that could do it?

Maybe: it depends on the time needed to build the TPLs and Geos on this machine. We can multiply these times by 15 to have an order of the times needed with the emulation layer.

    1. the size of the docker images: the image with the pre-built TPLs is very close to the 10 GB limit and I think that the base image (the image with the copy of the pangea modules that are needed to build the TPLs but without the TPLs built) is not very far. The cuda module alone is already more than 4 GB. Finally, it was not possible to work directly inside the containers and I had to mount my home to avoid the no space left on device error.

I'm surprised that this gets so big. E.g. https://hub.docker.com/r/geosx/pecan-gpu-gcc8.2.0-openmpi4.0.1-mkl2019.5-cuda11.5.119/tags is ~4.4GB (still very big, but half). Do you know what get's it so big? We're using a lot the multi-stage approach to remove the temporaries. Are you doing the same?

Also, if we manage to run it on a comfortable self-hosted machine, would the size issue become secondary?

I have about the same size for the image on DockerHub but it uses compression. Once pulled, for example, the pecan-gpu image is about 10.8 GB and I quickly get stuck with no space left. It is less annoying than the time issue (as it is possible to work in an external volume).

Algiane commented 8 months ago

@sframba : I have tested the connection of a ppc64le self-hosted runner to github-actions using a non official runner (https://github.com/ChristopherHX/github-act-runner). It worked smoothly for a simple script execution.