Open jkohnert opened 2 weeks ago
Here is a full build log of my pipeline. The build container is base on the current arch docker-image along with some enhancements for building also aur-related stuff.
@jkohnert interesting this shows a difficulty using shmap when in kubernetes (file /tmp/ompi.runner-9kk4tsne-project-16-concurrent-0-51i5q2wa.1000/jf.0/1038286848/shared_mem_cuda_pool.runner-9kk4tsne-project-16-concurrent-0-51i5q2wa not found) . Plus the fact that some .pal files are not found either. May be a filename max length problem?
'Normally' the dependency to the plplot library has been removed in gdl 1.1 and it rebuilds on the github CI environment without it AFAIK. ... but it looks like you need it (.pal) files. I mean, gdl comes with its .pal files so 'normally' again it should find them. At least there are no more references to 'plplot' in any of our build and install files. This is very unsatisfactory but you may try to install plplot and retry. This will not solve the strange 'shmap' problem though.
Hey @GillesDuvert since I get the same error on my local maschine (even with plplot still being available due to version 1.0.6 being installed), this seems to be a bit more complicated, at least to me.
I'm not really sure on how to debug it, though. I'll try running the built application in a debugger calling one of the failing .pro-files as soon as I have some spare time available.
Best, Jan
Short Update: I just build without OpenMPI locally, but the tests still fail. The warning regarding shmem (expectedly) disappears, but the *.pal-files are still not found.
Debugging reveals, that the files are sarched in "DATA_DIR" which is set to "/usr/share/gnudatalanguage" (due to install prefix given). The other options in plLibOpenPdfstrm()
fail. But since I'm testing an uninstalled version, the open call to the file obviously fails and I get the error above. So I'd probably have to set an env-variable telling the local plplot version where to look for those files. I'll do some more analysis.
Thanks again for the input. :)
Next update: Running PLPLOT_LIB=/home/jankoh/projects/gdl/src/plplot/data/ CTEST_OUTPUT_ON_FAILURE=1 make test
works just fine (given the fact the source on my local maschine is in /home/jankoh/projects/gdl/
. I can live with that for the moment and make the Arch-build work to be able to issue the update.
Anyway, we should probably make this work without the need of such a quite ugly hack. I'll try to make a PR.
Having said that, there's plInBuildTree()
in plplots code, probably just to make stuff working when running stuff inside the build tree. However, the lines
char currdir[PLPLOT_MAX_PATH], *pcurrdir = currdir;
char builddir[PLPLOT_MAX_PATH], *pbuilddir = builddir;
in there really look suspicious to me. If I understand them correctly, they define the char arrays currdir
and builddir
, and additionally the pointers pcurrdir
and pbuilddir
, and set them to undefined memory contents. They could be rewritten to make it a bit clearer:
char currdir[PLPLOT_MAX_PATH];
char *pcurrdir = currdir;
This doesn't segfault, since the array is defined first (and doesn't need extra allocation). But we cannot say what will be the content of the variable, since it depends on what was in the array's memory part beforehand. It's just random garbage. I currently highly doubt, this code has ever worked at all. But since my C is a bit rusty, I might be wrong on that.
Best, Jan
Dear Jan I'm sorry I cannot be more of help right now, but of course will come back to you soon. .pal files are indeed to be found in "DATA_DIR" which is apparently set to "/usr/share/gnudatalanguage" for you. If no full installation has been proceeded, this directory may not contain the .pal files. If not, there may be a (subtle?) cmake problem (we are by far not cmake experts as a quick glance to our cmake files will immediately show!). This would be the case for other needed files, too ( especially the .pro file in the "/usr/share/gnudatalanguage/lib" directory must be made up-to-date)
Hi,
I'm currently trying to update the package for Arch Linux (as I'm the maintainer in the aur).
However, there are failing tests:
All failing tests mention this error, and as far as I could find, it could to be related to OpenMPI, but I'm not yet sure.
I checked, there is enough space in /tmp (16GiB); so space-related problems do not seem to be the culprit. However, Arch currently has OpenMPI 5.0.5.
The failing tests are:
Does anyone have an idea how to track down the problem?
Best, Jan