eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

CRIU: portability with different versions of glibc #14253

Open tajila opened 2 years ago

tajila commented 2 years ago

CRIU image may not be portable due to glibc using newer features even if the jvm restricts itself to an older set of features.

Creating an image with an older glibc might help if it doesn't have updated versions for those features (ie: avx) but there are techniques which allows intercepting the cpuid instruction to control the selected feature set which should more broadly cover x86 software, https://github.com/ddcc/libcpuidoverride.

tajila commented 2 years ago

@ashu-mehra Can you please describe the sample application you ran to demonstrate this issue?

ashu-mehra commented 2 years ago

This issue happens when the application is using library functions that may have more than one implementation to take advantage of hardware features. Examples are memcpy, memset, etc. The decision about the implementation to use is taken by ld at runtime based on hardware feature flags available and it patches the GOT entry with the address of the implementation to be used. If this resolution and patching happens before taking checkpoint, then it can cause problem if the restore is done on a machine which does not support the specific hardware feature flags. I am currently experimenting with a C program that uses memset:

#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>

char *ptr = NULL;

char * get_user_ptr() {
    return ptr;
}

int get_user_size() {
    return 512;
}

void set_user_bytes(char ch) {
    memset(get_user_ptr(), ch, get_user_size());
}

int is_prime(int num) {
    int j;
    int divisors = 0;
    for (j = 2; j < num-1; j++) {
        if (num % 2 == 0) {
            divisors += 1;
        }
    }
    if (divisors == 0) {
        return 1;
    } else {
        return 0;
    }
}

void busyloop() {
    int i = 0;
    int count = 0;
    for (i = 0; i < 99999*2; i++) {
        int rc = is_prime(i);
        if (rc == 1) {
            count += 1;
        }
    }
}

int main(void) {
    ptr = malloc(512);
    set_user_bytes('a');
    busyloop();
    set_user_bytes('b');
    printf("Finished successfully\n");
    return 0;
}

In this program set_user_bytes uses memset function. Purpose of busyloop() is just to give me enough time to take a checkpoint. So the first call to set_user_bytes happens before checkpoint and second call happens after restore. If we take a checkpoint on a system with AVX2 feature, and restore on a system without this feature, we would hit SIGILL.

ashu-mehra commented 2 years ago

An update on this issue: I have been able to use a glibc tunable glibc.cpu.hwcaps[0] to disable certain cpu features when starting the application for checkpointing. This has worked well to overcome the issue with glibc/ld as mentioned above.

My tests were done on Fedora 30 system which has glibc 2.29. GLIBC_TUNABLES were set as:

export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC_Usable,-XSAVE_Usable,-AVX2_Usable,-ERMS,-AVX_Usable,-AVX_Fast_Unaligned_Load

The set of flags passed to tunable glibc.cpu.hwcaps would obviously depend on the cpu features available on the systems involved in the experiment.

Note that glibc 2.29 has a bug which prevented disabling XSAVE feature using GLIBC_TUNABLES and also prevented correct selection of dl_runtime_resolve_* function. This issue[1] has already been fixed in latest release of 2.34. For 2.29 I have to make couple of changes to address these issues and recompile glibc.

The flags passed to glibc.cpu.hwcaps have also undergone a change in newer glibc releases. The flags that are recognized by tunable glibc.cpu.hwcaps can be seen in cpu-tunables.c[2] in glibc sources.

[0] https://www.gnu.org/software/libc/manual/html_node/Hardware-Capability-Tunables.html [1] https://sourceware.org/bugzilla/show_bug.cgi?id=27605 [2] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/cpu-tunables.c;h=58f7a7f2509bedb967a13e0af2cd434c33079f18;hb=HEAD

tajila commented 2 years ago

@ashu-mehra thanks for the update. Just so I'm export GLIBC_TUNABLES=... needs to be set before the application starts? And there is no way to do this after the application has started (but before checkpoint) ?

ashu-mehra commented 2 years ago

Right, it needs to be set before launching the application.

tajila commented 2 years ago

https://www.gnu.org/software/libc/manual/html_node/Tunables.html suggests that there may be other ways to enable this:

It is possible to implement multiple ‘frontends’ for the tunables allowing distributions to choose their 
preferred method at build time

I asked the question above because the JVM already has a notion of Portable/non-Portable restore mode, but the JVM needs to be lauched before it know which mode it is in. Without this info we need to be pessimistic.

tajila commented 2 years ago

Just for completeness, Ill post the solution I was thinking of exploring:

Reload glibc on restore:

I never got around to trying this.

ashu-mehra commented 2 years ago

@tajila I am not sure I understand how reloading of glibc would solve this problem. Once the GOT entry for a library function is patched with an implementation, then the ld would not attempt to resolve it again. So loading glibc again may not help here. It may require editing the checkpoint image to "unresolve" the GOT entries. This can actually be achieved using env variable LD_BIND_NOT[1] which would keep GOT entries unresolved, but not sure if this env variable has any role in selecting dl_runtime_resolve_* function.

[1] https://man7.org/linux/man-pages/man8/ld.so.8.html