Closed j-w-jones closed 2 years ago
This is a bit weird, but starting a make
in exports while the static build is still running looks like the only explanation. (Wonder why that has not come up before, unless everybody is silently limiting their make
to just a handful of parallel jobs).
Could you check if adding shared
to the .NOTPARALLEL
in line 40 of the toplevel Makefile fixes this ?
Hi Martin
I have tried that, but it didn’t make a difference I’m afraid.
I have run ‘nm’ on libopenblas_skylakexp-r0.3.12.so from a sequential build and a parallel build. They’re attached.
You can see that some symbols are undefined in the parallel one, but defined in the sequential one, for example: “slagge_”.
Regards,
Jason
-- Dr Jason W Jones Associate Professor College of Engineering Swansea University Singleton Park Swansea UK SA2 8PP Tel: +44-1792-295869
[cid:image001.jpg@01D6B377.8A6F5300]
From: Martin Kroeker notifications@github.com Sent: 05 November 2020 12:35 To: xianyi/OpenBLAS OpenBLAS@noreply.github.com Cc: Jones J.W. J.W.Jones@Swansea.ac.uk; Author author@noreply.github.com Subject: Re: [xianyi/OpenBLAS] Parallel Build Failing (#2973)
This is a bit weird, but starting a make in exports while the static build is still running looks like the only explanation. (Wonder why that has not come up before, unless everybody is silently limiting their make to just a handful of parallel jobs). Could you check if adding shared to the .NOTPARALLEL in line 40 of the toplevel Makefile fixes this ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxianyi%2FOpenBLAS%2Fissues%2F2973%23issuecomment-722350475&data=04%7C01%7CJ.W.Jones%40Swansea.ac.uk%7Caedb59580626423e5e6208d8818747e1%7Cbbcab52e9fbe43d6a2f39f66c43df268%7C0%7C1%7C637401765309498952%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=VGOIwGif0WNFc489ezkvI6a9X5mRG%2FOqwjDqwKxmO44%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABMGLFLZBZW6RTRFOOZD65DSOKLXTANCNFSM4TLIMH3A&data=04%7C01%7CJ.W.Jones%40Swansea.ac.uk%7Caedb59580626423e5e6208d8818747e1%7Cbbcab52e9fbe43d6a2f39f66c43df268%7C0%7C1%7C637401765309498952%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3MB2ziAmUsh2a%2Fr1oaDp84lcBm2bX1cLs345UbOwqfs%3D&reserved=0.
Hi Jason, unfortunately the nm
output was not attached. The symbols you mentioned above, slagge, slagsy et al. all belong to
TESTING/MATGEN, I'm wondering if we need .NOTPARALLEL for the entire lapack-netlib hierarchy (or at least in the Makefile in
lapack-netlib/TESTING/MATGEN). Still strange that this has not come up before - some of the CI jobs run on Epyc or ThunderX
hardware, need to check if these use make -j
though...
Not reproduced on a 96core ARM server, nor on a 48core AMD Epyc
Longer build log would be necessary, especially errors around slag*
functions.
Intel makes no 40-core CPU I assume it is 20core some-lake platinum?
Hi Andrew
I redid the built in parallel but in /tmp, which is a normal SAS disk and it compiled fine.
The original builds I was doing were on a large Lustre filesystem. I know Lustre tends to cache more and flush to the disk far less but that shouldn’t affect any files that have been closed, i.e. the process using them has finished.
It does mean two processes writing to the same file, even if the actual writes are separated in time is far less likely to succeed than for a normal disk.
I did try to see if I could get ‘make’ to print out which build steps where in which threads, or maybe print a timestamp for each build but I couldn’t find anything.
The server has 2 x 20 Intel Xeon Gold 6230 cpus.
Cheers,
Jason
From: Andrew notifications@github.com Sent: 06 November 2020 13:05 To: xianyi/OpenBLAS OpenBLAS@noreply.github.com Cc: Jones J.W. J.W.Jones@Swansea.ac.uk; Author author@noreply.github.com Subject: Re: [xianyi/OpenBLAS] Parallel Build Failing (#2973)
Longer build log would be necessary, especially errors around slag* functions. Intel makes no 40-core CPU I assume it is 20core some-lake platinum?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxianyi%2FOpenBLAS%2Fissues%2F2973%23issuecomment-723070268&data=04%7C01%7CJ.W.Jones%40Swansea.ac.uk%7C7fd23f81827447dfc5ac08d882549100%7Cbbcab52e9fbe43d6a2f39f66c43df268%7C0%7C0%7C637402647001036297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vlf4zQpIPlTsBZIsldHZBG95%2FuFDh%2BZbKhvHofbGA6I%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABMGLFOJB3JB2NJ3T2J5LBDSOPX65ANCNFSM4TLIMH3A&data=04%7C01%7CJ.W.Jones%40Swansea.ac.uk%7C7fd23f81827447dfc5ac08d882549100%7Cbbcab52e9fbe43d6a2f39f66c43df268%7C0%7C0%7C637402647001036297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=L%2BxB8c5chj5Go43VWte0xgFb7v7UJbhboHJidaXuUUo%3D&reserved=0.
That would appear to be a consistency issue with your distributed filesystem then (or perhaps gmake's inability to handle such a case). I still have not seen this with the drone.io CI and whatever fs backend they use, where I believe any trivial cases of missing make
dependencies should show up. Is it always the same (small?) set of missing functions your (don't) see ?
In general make relies on very accurate timestamps, in case of NFS and seemingly lustre those are set by backend which may have time off more than timestamp resolution. Probably you got some warnings in regard to timestamps when building.
One solution is to assure very accurate time synchronisation, dev/shm
is kind of gratis filesystem with local timestamps, whough mounting /tmp
same way looks more legit.
Hi Andrew,
The timestamps are fine – make never complains about this. I have had this in the past with NFS mounted filesystems where the client and server clocks were out of sync.
I am wondering if two threads are updating the archive library at similar times (for example within a second of each other) and a normal, local disk deals with it better.
Anyway, I guess you can close this issue. If I do have time to investigate further and find anything I will let you know.
Cheers,
Jason
From: Andrew notifications@github.com Sent: 10 November 2020 11:55 To: xianyi/OpenBLAS OpenBLAS@noreply.github.com Cc: Jones J.W. J.W.Jones@Swansea.ac.uk; Author author@noreply.github.com Subject: Re: [xianyi/OpenBLAS] Parallel Build Failing (#2973)
In general make relies on very accurate timestamps, in case of NFS and seemingly lustre those are set by backend which may have time off more than timestamp resolution. Probably you got some warnings in regard to timestamps when building. One solution is to assure very accurate time synchronisation, dev/shm is kind of gratis filesystem with local timestamps, whough mounting /tmp same way looks more legit.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxianyi%2FOpenBLAS%2Fissues%2F2973%23issuecomment-724654729&data=04%7C01%7CJ.W.Jones%40Swansea.ac.uk%7C5dcf87a5c29d4101fd5108d8856f6c54%7Cbbcab52e9fbe43d6a2f39f66c43df268%7C0%7C0%7C637406060886441843%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BAiP3dItJ7RkYX3DJ%2BzPJnEwCPSP8%2B%2FAmIeY9324LZo%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABMGLFK7VFWCVL3BNAAIH5LSPESYBANCNFSM4TLIMH3A&data=04%7C01%7CJ.W.Jones%40Swansea.ac.uk%7C5dcf87a5c29d4101fd5108d8856f6c54%7Cbbcab52e9fbe43d6a2f39f66c43df268%7C0%7C0%7C637406060886451836%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OGzBSCHg04MFasKJzr3yjbqeaQZlFO3iBXp4VRzJc3Y%3D&reserved=0.
Unfortunately I do not have access to a lustre-based DFS, so the best option seems to be to mention this topic in the wiki and/or README. (Need to look into Makefile debugging with something like https://github.com/rocky/remake)
make complains only if it sees source files timestamped in future, e.g. you do something CI style on the cluster - expand tarball on the node living in future, then compile from same place on other node which sees unlikely future files. It is not only NFS problem, sometimes reboot and misguided timezone setups trigger that.
You will diagnose unwillingly building next software package. Lustre says to use NTP, please ask your admin to check. https://wiki.lustre.org/Operating_System_Configuration_Guidelines_For_Lustre#Date_and_Time_Synchronization_with_NTP
Hi Andrew
I am the system admin and the cluster does use NTP. Lustre would be throwing errors regularly if clocks were not synced.
Regards,
Jason
From: Andrew notifications@github.com Sent: 10 November 2020 15:39 To: xianyi/OpenBLAS OpenBLAS@noreply.github.com Cc: Jones J.W. J.W.Jones@Swansea.ac.uk; Author author@noreply.github.com Subject: Re: [xianyi/OpenBLAS] Parallel Build Failing (#2973)
make complains only if it sees source files timestamped in future, e.g. you do something CI style on the cluster - expand tarball on the node living in future, then compile from same place on other node which sees unlikely future files. It is not only NFS problem, sometimes reboot and misguided timezone setups trigger that.
You will diagnose unwillingly building next software package. Lustre says to use NTP, please ask your admin to check. https://wiki.lustre.org/Operating_System_Configuration_Guidelines_For_Lustre#Date_and_Time_Synchronization_with_NTPhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.lustre.org%2FOperating_System_Configuration_Guidelines_For_Lustre%23Date_and_Time_Synchronization_with_NTP&data=04%7C01%7CJ.W.Jones%40Swansea.ac.uk%7Ca50d45ef770c4ac59c8908d8858ec4c0%7Cbbcab52e9fbe43d6a2f39f66c43df268%7C0%7C0%7C637406195529886471%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Xs%2BQ5n5YPjAq3ifkXNjJwGIuVR0AKMA491aZV3nLgJA%3D&reserved=0
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxianyi%2FOpenBLAS%2Fissues%2F2973%23issuecomment-724782641&data=04%7C01%7CJ.W.Jones%40Swansea.ac.uk%7Ca50d45ef770c4ac59c8908d8858ec4c0%7Cbbcab52e9fbe43d6a2f39f66c43df268%7C0%7C0%7C637406195529886471%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KZhIo5J48zqBEFwBqoAsmjr76tnh2MWFcEFdian682Y%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABMGLFJVHTTO5S4ZWML2S73SPFNBXANCNFSM4TLIMH3A&data=04%7C01%7CJ.W.Jones%40Swansea.ac.uk%7Ca50d45ef770c4ac59c8908d8858ec4c0%7Cbbcab52e9fbe43d6a2f39f66c43df268%7C0%7C0%7C637406195529896465%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TCpesuixzjt9255JlcHXzJHnWcsdrHV9cz7saMhpEiY%3D&reserved=0.
You can always provide native OS package built on a better filesystem. OpenBLAS does not do anything specific with make
to push it out of normal operation. NFS with bad times is known to fail, Lustre may or may not fail same way.
Full log (like script
) should tell more about destinies of missing files.
Please delete original message if replying this via e-mail
Err, @brada4, can you give this a rest please ?
I have been experiencing this issue for parallel builds on a dual-socket AMD Epyc system with 64 cores and a BeeGFS distributed, shared filesystem. It was easily reproducible in my setup. But I tried, as suggested in @martin-frbg's comment, to add a .NOTPARALLEL:
line to the file lapack-netlib/LAPACKE/src/Makefile
. This appears to have worked, and I no longer see the issue.
Maybe it's not a suitable fix, but at least it can serve as a workaround for now.
Thanks for that report. So you were seeing the problem with symbols from LAPACKE only, while the original poster was missing functions from MATGEN ?
If you set NFS server date like 20 seconds in future you get that with anything involving any kind of make. Anyone can afford 1GB ramdisk for all their build needs.
Here is the relevant part of the build log, which shows the list of symbols that are missing. Let me know, and I can include the entire log, if needed.
make[2]: Entering directory '/global/D1/homes/james/ex3modules/defq/1.0.0/src/openblas-0.3.12/exports'
perl ./gensymbol linktest x86_64 _ 0 0 0 0 0 0 "" "" 1 0 1 1 1 1 > linktest.c
cc -O2 -DMAX_STACK_ALLOC=2048 -DUSE_LOCKING -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DDYNAMIC_ARCH -DNO_WARMUP -DMAX_CPU_NUMBER=256 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.12\" -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.. -shared -o ../libopenblas-r0.3.12.so \
-Wl,--whole-archive ../libopenblas-r0.3.12.a -Wl,--no-whole-archive \
-Wl,-soname,libopenblas.so.0 -lm -lgfortran -lm -lgfortran
cc -O2 -DMAX_STACK_ALLOC=2048 -DUSE_LOCKING -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DDYNAMIC_ARCH -DNO_WARMUP -DMAX_CPU_NUMBER=256 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.12\" -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.. -w -o linktest linktest.c ../libopenblas-r0.3.12.so -L/cm/shared/apps/slurm/20.02.7/lib/x86_64-linux-gnu -L/cm/shared/apps/slurm/20.02.7/lib/../lib -L/cm/shared/apps/slurm/20.02.7/lib64/../lib -L/cm/shared/apps/slurm/20.02.7/lib64/../lib -L/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../x86_64-linux-gnu/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/cm/shared/apps/slurm/20.02.7/lib -L/cm/shared/apps/slurm/20.02.7/lib64 -L/cm/shared/apps/slurm/20.02.7/lib64/slurm -L/cm/shared/apps/slurm/20.02.7/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../x86_64-linux-gnu/lib -L/usr/lib/gcc/x86_64-linux-gnu/7/../../.. -lgfortran -lm -lquadmath -lm -lc && echo OK.
/tmp/cc3YEbzE.o: In function `main':
linktest.c:(.text.startup+0xf88): undefined reference to `slagge_'
linktest.c:(.text.startup+0xf8f): undefined reference to `slagsy_'
linktest.c:(.text.startup+0xf96): undefined reference to `slahilb_'
linktest.c:(.text.startup+0xf9d): undefined reference to `slakf2_'
linktest.c:(.text.startup+0xfa4): undefined reference to `slaran_'
linktest.c:(.text.startup+0xfab): undefined reference to `slarge_'
linktest.c:(.text.startup+0xfb2): undefined reference to `slarnd_'
linktest.c:(.text.startup+0xfb9): undefined reference to `slaror_'
linktest.c:(.text.startup+0xfc0): undefined reference to `slarot_'
linktest.c:(.text.startup+0xfc7): undefined reference to `slatm1_'
linktest.c:(.text.startup+0xfce): undefined reference to `slatm2_'
linktest.c:(.text.startup+0xfd5): undefined reference to `slatm3_'
linktest.c:(.text.startup+0xfdc): undefined reference to `slatm5_'
linktest.c:(.text.startup+0xfe3): undefined reference to `slatm6_'
linktest.c:(.text.startup+0xfea): undefined reference to `slatm7_'
linktest.c:(.text.startup+0xff1): undefined reference to `slatme_'
linktest.c:(.text.startup+0xff8): undefined reference to `slatmr_'
linktest.c:(.text.startup+0xfff): undefined reference to `slatms_'
linktest.c:(.text.startup+0x1006): undefined reference to `slatmt_'
linktest.c:(.text.startup+0x1b82): undefined reference to `dlagge_'
linktest.c:(.text.startup+0x1b89): undefined reference to `dlagsy_'
linktest.c:(.text.startup+0x1b90): undefined reference to `dlahilb_'
linktest.c:(.text.startup+0x1b97): undefined reference to `dlakf2_'
linktest.c:(.text.startup+0x1b9e): undefined reference to `dlaran_'
linktest.c:(.text.startup+0x1ba5): undefined reference to `dlarge_'
linktest.c:(.text.startup+0x1bac): undefined reference to `dlarnd_'
linktest.c:(.text.startup+0x1bb3): undefined reference to `dlaror_'
linktest.c:(.text.startup+0x1bba): undefined reference to `dlarot_'
linktest.c:(.text.startup+0x1bc1): undefined reference to `dlatm1_'
linktest.c:(.text.startup+0x1bc8): undefined reference to `dlatm2_'
linktest.c:(.text.startup+0x1bcf): undefined reference to `dlatm3_'
linktest.c:(.text.startup+0x1bd6): undefined reference to `dlatm5_'
linktest.c:(.text.startup+0x1bdd): undefined reference to `dlatm6_'
linktest.c:(.text.startup+0x1be4): undefined reference to `dlatm7_'
linktest.c:(.text.startup+0x1beb): undefined reference to `dlatme_'
linktest.c:(.text.startup+0x1bf2): undefined reference to `dlatmr_'
linktest.c:(.text.startup+0x1bf9): undefined reference to `dlatms_'
linktest.c:(.text.startup+0x1c00): undefined reference to `dlatmt_'
linktest.c:(.text.startup+0x289b): undefined reference to `clagge_'
linktest.c:(.text.startup+0x28a2): undefined reference to `claghe_'
linktest.c:(.text.startup+0x28a9): undefined reference to `clagsy_'
linktest.c:(.text.startup+0x28b0): undefined reference to `clahilb_'
linktest.c:(.text.startup+0x28b7): undefined reference to `clakf2_'
linktest.c:(.text.startup+0x28be): undefined reference to `clarge_'
linktest.c:(.text.startup+0x28c5): undefined reference to `clarnd_'
linktest.c:(.text.startup+0x28cc): undefined reference to `claror_'
linktest.c:(.text.startup+0x28d3): undefined reference to `clarot_'
linktest.c:(.text.startup+0x28da): undefined reference to `clatm1_'
linktest.c:(.text.startup+0x28e1): undefined reference to `clatm2_'
linktest.c:(.text.startup+0x28e8): undefined reference to `clatm3_'
linktest.c:(.text.startup+0x28ef): undefined reference to `clatm5_'
linktest.c:(.text.startup+0x28f6): undefined reference to `clatm6_'
linktest.c:(.text.startup+0x28fd): undefined reference to `clatme_'
linktest.c:(.text.startup+0x2904): undefined reference to `clatmr_'
linktest.c:(.text.startup+0x290b): undefined reference to `clatms_'
linktest.c:(.text.startup+0x2912): undefined reference to `clatmt_'
linktest.c:(.text.startup+0x33d1): undefined reference to `zlagge_'
linktest.c:(.text.startup+0x33d8): undefined reference to `zlaghe_'
linktest.c:(.text.startup+0x33df): undefined reference to `zlagsy_'
linktest.c:(.text.startup+0x33e6): undefined reference to `zlahilb_'
linktest.c:(.text.startup+0x33ed): undefined reference to `zlakf2_'
linktest.c:(.text.startup+0x33f4): undefined reference to `zlarge_'
linktest.c:(.text.startup+0x33fb): undefined reference to `zlarnd_'
linktest.c:(.text.startup+0x3402): undefined reference to `zlaror_'
linktest.c:(.text.startup+0x3409): undefined reference to `zlarot_'
linktest.c:(.text.startup+0x3410): undefined reference to `zlatm1_'
linktest.c:(.text.startup+0x3417): undefined reference to `zlatm2_'
linktest.c:(.text.startup+0x341e): undefined reference to `zlatm3_'
linktest.c:(.text.startup+0x3425): undefined reference to `zlatm5_'
linktest.c:(.text.startup+0x342c): undefined reference to `zlatm6_'
linktest.c:(.text.startup+0x3433): undefined reference to `zlatme_'
linktest.c:(.text.startup+0x343a): undefined reference to `zlatmr_'
linktest.c:(.text.startup+0x3441): undefined reference to `zlatms_'
linktest.c:(.text.startup+0x3448): undefined reference to `zlatmt_'
collect2: error: ld returned 1 exit status
Makefile:181: recipe for target '../libopenblas-r0.3.12.so' failed
make[2]: *** [../libopenblas-r0.3.12.so] Error 1
make[2]: Leaving directory '/global/D1/homes/james/ex3modules/defq/1.0.0/src/openblas-0.3.12/exports'
Makefile:116: recipe for target 'shared' failed
make[1]: *** [shared] Error 2
make[1]: Leaving directory '/global/D1/homes/james/ex3modules/defq/1.0.0/src/openblas-0.3.12'
makefiles/openblas-0.3.12.mk:53: recipe for target '/global/D1/homes/james/ex3modules/defq/1.0.0/pkgs/openblas-0.3.12/.pkgbuild' failed
make: *** [/global/D1/homes/james/ex3modules/defq/1.0.0/pkgs/openblas-0.3.12/.pkgbuild] Error 2
Hm, that looks a lot like MATGEN (same as original post) so any addition to the LAPACKE Makefile may have been coincidental (or just enough to reduce pressure on the filesystem as a side effect)...
... (or was that just a Freudian slip, and you actually edited TESTING/MATGEN/Makefile but were thinking about LAPACKE when you wrote your comment ? MATGEN/Makefile was what I suggested back then, and "fixing" just that Makefile would have much less impact on build times on unaffected systems)
Please attach entire log, there should be complaints from make
about dates of files being in future.
Just grep -i future
in that output.
Hm, that looks a lot like MATGEN (same as original post) so any addition to the LAPACKE Makefile may have been coincidental (or just enough to reduce pressure on the filesystem as a side effect)...
I see. I did in fact edit lapack-netlib/LAPACKE/src/Makefile
. Maybe you are right that it was only a coincidental fix.
I also tried again with your suggestion of adding .NOTPARALLEL:
to lapack-netlib/TESTING/MATGEN/Makefile
instead, and so far it appears to have done the trick. That is, I have built a couple of times without observing any issues. (I made sure there was a fairly heavy load on the distributed file system during the builds.)
I have attached the standard output and standard error streams from the failed attempt mentioned in my previous comment. The build command is:
$ make FC=gfortran DYNAMIC_ARCH=1 TARGET=HASWELL USE_THREAD=0 USE_LOCKING=1 USE_OPENMP=0 NUM_THREADS=256 NO_AFFINITY=1
There are no messages about future dates or timestamps.
What is in pkgs/Makefile ? Like patches? Parameter filters? Any env variables set?
I'm trying to compile OpenBLAS 0.3.12 using GCC 10.2.0.
When it gets to the linktest part it fails with unresolved symbols. And when I look in the .so file using 'nm' I can see a load of symbols are undefined.
perl ./gensymbol linktest x8664 0 0 0 0 0 0 "" "" 1 0 1 1 1 1 > linktest.c gcc -O2 -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=80 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.12\" -msse3 -mssse3 -msse4.1 -march=skylake-avx512 -mavx2 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHARCNAME -DASMNAME= -DASMFNAME= -DNAME=_ -DCNAME= -DCHARNAME=\"\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.. -shared -o ../libopenblas_skylakexp-r0.3.12.so \ -Wl,--whole-archive ../libopenblas_skylakexp-r0.3.12.a -Wl,--no-whole-archive \ -Wl,-soname,libopenblas.so.0 -lm -lpthread -lgfortran -lm -lpthread -lgfortran gcc -O2 -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=80 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.12\" -msse3 -mssse3 -msse4.1 -march=skylake-avx512 -mavx2 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHARCNAME -DASMNAME= -DASMFNAME= -DNAME=_ -DCNAME= -DCHARNAME=\"\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I.. -w -o linktest linktest.c ../libopenblas_skylakexp-r0.3.12.so -L/opt/software/base/gcc/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0 -L/opt/software/base/gcc/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/opt/software/base/gcc/10.2.0/lib/gcc/x8664-pc-linux-gnu/10.2.0/../../.. -lgfortran -lm -lquadmath -lm -lc && echo OK. /tmp/ccaHbXUr.o: In function
main': linktest.c:(.text.startup+0xf88): undefined reference to
slagge' linktest.c:(.text.startup+0xf8f): undefined reference toslagsy_' linktest.c:(.text.startup+0xf96): undefined reference to
slahilb' linktest.c:(.text.startup+0xf9d): undefined reference to `slakf2' linktest.c:(.text.startup+0xfa4): undefined reference toslaran_' linktest.c:(.text.startup+0xfab): undefined reference to
slarge' linktest.c:(.text.startup+0xfb2): undefined reference to `slarnd'However, if I turn off parallel make then it builds fine.
I have 40 cores in my server so could there be an issue with the make dependencies causing the dynamic library to be build from the statis library before the static library has finished being built?