conda-forge / gromacs-feedstock

A conda-smithy repository for gromacs.
BSD 3-Clause "New" or "Revised" License
0 stars 9 forks source link

RDTSCP #23

Closed mattwthompson closed 12 months ago

mattwthompson commented 1 year ago

Comment:

Some of my CI runs are crashing with this spooky error

                 :-) GROMACS - gmx mdrun, 2023.2-conda_forge (-:

Executable:   /Users/runner/micromamba/envs/interchange-examples-env/bin.AVX2_256/gmx
Data prefix:  /Users/runner/micromamba/envs/interchange-examples-env
Working dir:  /private/var/folders/3s/vfzpb5r51gs6y328rmlgzm7c0000gn/T/tmpxj1jcgnp
Command line:
  gmx mdrun -s out.tpr -e out.edr -ntomp 1

-------------------------------------------------------
Program:     gmx mdrun, version 2023.2-conda_forge
Source file: src/gromacs/hardware/printhardware.cpp (line 108)

Fatal error:
The gmx mdrun executable was compiled to use the rdtscp CPU instruction.
However, this is not supported by the current hardware and continuing would
lead to a crash. Please rebuild GROMACS with the GMX_USE_RDTSCP=OFF CMake
option.

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

As far as I can tell, RDTSCP is something that is found on older x86 hardware (or see "very old machines" below). I have not seen this in Linux and it doesn't crash on all macOS builds. I think GitHub may be using old hardware for some macOS runners, but this seems exceedingly difficult to verify. This has been happening for the past two weeks or so and I have never seen it before. Searching the web does not bring up much, otherwise I wouldn't open it here.

I can find a brief mention of it

RDTSCP usage and reporting GROMACS now defaults always on x86 to use the RDTSCP machine instruction for lower latency timing. Very old machines might need to configure with GMX_USE_RDTSCP=off. Non-x86 platforms are unaffected, except that they will no longer report that RDTSCP is disabled (because that is self-evident).

https://manual.gromacs.org/documentation/2023/release-notes/2021/major/portability.html#rdtscp-usage-and-reporting

I have no idea if "lower latency timing" relates to performance in runs or something else in some other context, like debugging or some low-level operation. I'd be happy to submit a PR that toggles this flag, but I am clueless as to whether or not that's a good idea.

mabraham commented 1 year ago

GROMACS dev here. I think your deduction that the x86 hardware for these runners may be old is a sound one (but if so, the AVX2_256 probably won't work either). Perhaps more likely is that the runner is running in a VM and the virtualization layer does not make RDTSCP available (even though it might make AVX stuff available). Either way, the impact of lower-latency timing on people prepared to use a conda build of GROMACS is almost zero, so I think toggling GMX_USE_RDTSCP=off is simple and good enough for the purpose.

mattwthompson commented 1 year ago

Wonderful - thanks Mark!

I seem to have forgotten the context here - that these builds are more often for getting the gmx executables installed at all and not for optimizing performance in production runs. I'll open the PR to see what happens

mabraham commented 1 year ago

Yes, CI builds make the binaries that others install. I have no idea from where the gmx mdrun -s out.tpr -e out.edr -ntomp 1 call in the OP comes, however. It's possible that the runners that made the gmx binary were fine (if some tests were run there), and that some runner for some downstream CI hit a different runtime context and that context didn't have RDTSP available.

mattwthompson commented 12 months ago

Hopefully #24 fixes this!