HotswapProjects / HotswapAgent

Java unlimited redefinition of classes at runtime.
GNU General Public License v2.0
2.36k stars 493 forks source link

Conflict with asm event executor? #312

Open ninja- opened 5 years ago

ninja- commented 5 years ago

I am having an issue with ASM generated event handler methods after hotswap which looks something like this. Not sure if it's related to ASM or it's other logical bug? These classes were not even changed during hotswap. They were hotswapped, but these were the same class file.

java.lang.ClassCastException: class MyListener cannot be cast to class MyListener (MyListener and MyListener are in unnamed module of loader blabla.PluginClassloader @54e22bdd)
    at MyListener$GeneratedEventExecutor9$onSomeEvent.invoke(Unknown Source)
ninja- commented 5 years ago

It's some kind of race. Doesn't always break.

ninja- commented 5 years ago

I disabled asm generator and used reflection fallback. Now it's much harder to race, but it still can race after a number of hotswaps.

2019/08/22 00:21:27 java.lang.Error: Method rejected target/argument: MyCoolEvent
2019/08/22 00:21:27     at EventBus.post(EventBus.java:58)
2019/08/22 00:22:55 Caused by: java.lang.IllegalArgumentException: java.lang.ClassCastException@9f618cc
2019/08/22 00:22:55     at jdk.internal.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
2019/08/22 00:22:55     at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2019/08/22 00:22:55     at java.base/java.lang.reflect.Method.invoke(Method.java:566)
ninja- commented 5 years ago

I have most framework plugin disabled and I tried disabling init and anonymous patch plugin to check if it helps, but it doesn't. The only idea I have is regarding this entry

 Reloading classes [MyListener, MyListener] (autoHotswap)

The race is getting harder to trigger lately, but so far I saw this pattern 3 times when it raced and never when it kept working properly after many hotswaps. for some reason it can catch the same class twice within one hotswap, and it would result in a different Class object, because Class is the map key, so otherwise it have been overriden?

ninja- commented 5 years ago

@skybber I'd love to have your ideas on this. So far I tried moving class loading within synchronized block but that didn't help, can still produce more than 1 class in the same hotswap.

https://github.com/HotswapProjects/HotswapAgent/blob/master/hotswap-agent-core/src/main/java/org/hotswap/agent/plugin/hotswapper/HotswapperPlugin.java#L72

For each updated file there is EVENT_MODIFY & EVENT_CREATE so idk, maybe some 200ms debounce based on filename would do, instead of basing this on class. The alternative would be to write my own plugin without FileWatcher that calls hotswap when files are 100% ready with my own list of classess...but I am hoping to just fix this race in HotswapperPlugin...

EDIT:

Eh, I tried to disable adding same class to reloadMap and it did work, but still didn't fix the problem that can break event executors :/

for (Map.Entry<Class<?>, byte[]> entry : reloadMap.entrySet()) {
  if(entry.getKey().getName().equals(clazz.getName())) {

    LOGGER.warning("ACHTUNG! Hotswapper prevented a race for " + clazz.getName());
    return;
  }
}
            reloadMap.put(clazz, ctClass.toBytecode());
2019/08/22 02:11:38 04:11:32 [INFO] HOTSWAP AGENT: 04:11:32.711 WARNING (org.hotswap.agent.plugin.hotswapper.HotswapperPlugin) - ACHTUNG! Hotswapper prevented a race for MyListener
skybber commented 5 years ago

I don't see all the background of your problem properly but generally:

ninja- commented 5 years ago

If it needs to be discarded then why it's working most of the time and can break after random amount of identical swaps?

skybber commented 5 years ago

I don't know how your framework works :) If there is reflection cache, clear it. Many HA plugins do it.

ninja- commented 5 years ago

I don't think it's related to reflection cache. It bugs out the same way for both ASM event executor and reflection executor. It crashes on CHECKCAST and there should be no new class object...they should stay same especially since there were no class schema changes.

AsmClassLoader which defines these classes is a ClassLoader(not URLClassLoader) with parent of app classloader which has a hotswap-agent.properties

For me it sounds more like it's some race with loading the class file half-flushed to disk... Maybe java fallbacks to loading class from parent if it's corrupted?

skybber commented 5 years ago

There could be race condition if a group of classes (e.g. A,B,C) are being to be redefined, but only part of them are synced on extraclasspath (e.g. A,C) within hotswapper timeout. Btw, the problem with multi event for redefinition (EVENT_CREATE, EVENT_MODIFY) looks like bug. Btw AsmClassLoader generates .class to extraclasspath?

ninja- commented 5 years ago

Btw, the problem with multi event for redefinition (EVENT_CREATE, EVENT_MODIFY) looks like bug

Yeah, kind of, I know you mostly tested it with Maven or IntelliJ but what I am doing is unzip plugin.jar into extraClassPath.

AsmClassLoader is just this:

final class AsmClassLoader extends ClassLoader {
    private static final ConcurrentMap<ClassLoader, AsmClassLoader> loaders = new ConcurrentHashMap<>();
    public static AsmClassLoader getLoader(ClassLoader parent) {
        return loaders.computeIfAbsent(Objects.requireNonNull(parent, "Null parent class-loader"), AsmClassLoader::new);
    }

    public Class<?> defineClass(String name, byte[] data) {
        name = Objects.requireNonNull(name, "Null name").replace('/', '.');
        synchronized (getClassLoadingLock(name)) {
            if (hasClass(name)) throw new IllegalStateException(name + " already defined");
            Class<?> c = this.define(name, Objects.requireNonNull(data, "Null data"));
            if (!c.getName().equals(name)) throw new IllegalArgumentException("class name " + c.getName() + " != requested name " + name);
            return c;
        }
    }

    protected AsmClassLoader(ClassLoader parent) {
        super(parent);
    }

    private Class<?> define(String name, byte[] data) {
        synchronized (getClassLoadingLock(name)) {
            if (hasClass(name)) throw new IllegalStateException("Already has class: " + name);
            Class<?> c;
            try {
                c = defineClass(name, data, 0, data.length);
            } catch (ClassFormatError e) {
                throw new IllegalArgumentException("Illegal class data", e);
            }
            resolveClass(c);
            return c;
        }
    }

    @Override
    public Object getClassLoadingLock(String name) {
        return super.getClassLoadingLock(name);
    }

    public boolean hasClass(String name) {
        synchronized (getClassLoadingLock(name)) {
            try {
                Class.forName(name);
                return true;
            } catch (ClassNotFoundException e) {
                return false;
            }
        }
    }
}

I think I fixed this with such implementation:

    /**
     * For each changed class create a reload command.
     */
    @OnClassFileEvent(classNameRegexp = ".*", events = {FileEvent.MODIFY})
    public void watchReload(CtClass ctClass, ClassLoader appClassLoader, URL url) throws IOException, CannotCompileException, Exception {
      if (!ClassLoaderHelper.isClassLoaded(appClassLoader, ctClass.getName())) {
            LOGGER.trace("Class {} not loaded yet, no need for autoHotswap, skipped URL {}", ctClass.getName(), url);
            return;
        }

        LOGGER.debug("Class {} will be reloaded from URL {}", ctClass.getName(), url);

java.lang.reflect.Method m = ClassLoader.class.getDeclaredMethod("findLoadedClass", String.class);
 m.setAccessible(true);
        // search for a class to reload
        Class clazz;
synchronized(reloadMap) {
        try {
            clazz  = (Class<?>) m.invoke(appClassLoader, ctClass.getName()); // appClassLoader.findClass(ctClass.getName());
if (clazz == null) throw new ClassNotFoundException("Null findClass");
        } catch (ClassNotFoundException e) {
            LOGGER.warning("Hotswapper tries to reload class {}, which is not known to application classLoader {}.",
                    ctClass.getName(), appClassLoader);
            return;
        }

for(java.util.Iterator<Map.Entry<Class<?>, byte[]>> it = reloadMap.entrySet().iterator(); it.hasNext(); ) {
    Map.Entry<Class<?>, byte[]> entry = it.next();

  if(entry.getKey().getName().equals(clazz.getName())) {
    Class<?> dwa = entry.getKey();
    LOGGER.warning("ACHTUNG! Hotswapper prevented a race for " + clazz.getName() + " dwa=" + dwa);

     it.remove(); // newer is better here I guess
 //   break;
    //return;
  }
}
            reloadMap.put(clazz, ctClass.toBytecode());

        scheduler.scheduleCommand(hotswapCommand, 1000, Scheduler.DuplicateSheduleBehaviour.SKIP);
}
    }

With this it never breaks, yay! And I spammed the hotswaps hard. Not sure yet if the fix comes from getting rid of loadClass(using it sounds like terrible idea) or increasing time to 1000ms or removing older entry in reloadMap...

ninja- commented 5 years ago

Eh, and it still raced after more tests and broke the event executors :( I am out of ideas

ninja- commented 5 years ago

But I can still find the same pattern when it breaks... If after my unzip it splits the reload into two parts like this based on events...then it breaks.

2019/08/22 14:13:15 16:13:15 [INFO] HOTSWAP AGENT: 16:13:15.810 RELOAD (org.hotswap.agent.config.PluginManager) - Reloading classes [MyListener] (autoHotswap)
2019/08/22 14:13:15 16:13:15 [INFO] HOTSWAP AGENT: 16:13:15.911 RELOAD (org.hotswap.agent.config.PluginManager) - Reloading classes [LotsOfMyOtherClasses, MyListener]
ninja- commented 5 years ago

Watch can get called up to 4 times with MODIFY for the same class

2019/08/22 14:44:29 16:44:29 [INFO] HOTSWAP AGENT: 16:44:29.541 DEBUG (org.hotswap.agent.annotation.handler.WatchEventCommand) - Executing resource changed method watchReload on class org.hotswap.agent.plugin.hotswapper.HotswapperPlugin for event WatchFileEvent on path MyListener.class for event ENTRY_MODIFY
2019/08/22 14:44:29 16:44:29 [INFO] HOTSWAP AGENT: 16:44:29.615 DEBUG (org.hotswap.agent.annotation.handler.WatchEventCommand) - Executing resource changed method watchReload on class org.hotswap.agent.plugin.hotswapper.HotswapperPlugin for event WatchFileEvent on path MyListener.class for event ENTRY_MODIFY
2019/08/22 14:44:29 16:44:29 [INFO] HOTSWAP AGENT: 16:44:29.671 DEBUG (org.hotswap.agent.annotation.handler.WatchEventCommand) - Executing resource changed method watchReload on class org.hotswap.agent.plugin.hotswapper.HotswapperPlugin for event WatchFileEvent on path MyListener.class for event ENTRY_MODIFY
2019/08/22 14:44:29 16:44:29 [INFO] HOTSWAP AGENT: 16:44:29.682 DEBUG (org.hotswap.agent.annotation.handler.WatchEventCommand) - Executing resource changed method watchReload on class org.hotswap.agent.plugin.hotswapper.HotswapperPlugin for event WatchFileEvent on path MyListener.class for event ENTRY_MODIFY

It's one of the biggest class files in project so that's what could make it to "special". Unzip could modify it a few times because of bufffering. But how could it really affect the hotswapper? Like, are we loading garbage data and silently falling back to something without any logs? I guess one way to check if it's loading garbage is to debug the buffer size.

EDIT: I checked and even if there are 4 modify events the buffer is always the same size with proper data. So that's not it.

skybber commented 5 years ago

May be you can unzip jar somewhere and sync (rsync) it with extraclasspath. It should be faster then unzip. Even - if some class is untouched, then it will not be redefined. However the redefinition in hotswapper is simple one and not ready for huge and frequent redefinition. Otherwise I'm using redefinition from IDE, not hotswapper & extraclasspath.

ninja- commented 5 years ago

Rsync doesn't help much as I checked. I think what could be added is for example: add SIGUSR2 handler that goes through each extraClasspath file and add it's to reloadMap and then reloads at once. Signal meaning it's easier to integrate from outside world or some scripting.

There could be race condition if a group of classes (e.g. A,B,C) are being to be redefined, but only part of them are synced on extraclasspath (e.g. A,C) within hotswapper timeout.

Can you explain how that could result in two instances of the same class with same name within one classloader? If other class was to "pull" my class from bad classloade, I think I'd notice on -verbose:class, and all entries related to reloading MyListener are coming from VM_RedefineClass :/ except the one and only first entry where it's loaded from jar (even though I increased timeout to 1000ms which should be plenty, probably it still 'groups' the classes because of some values hardcoded elsewhere within watcher...)

ninja- commented 5 years ago

I implemented the idea above that reloads all classes on a signal, and it still can result in ClassCastException :/ what's even more weird is that some event executors from the same class can break while others can still work. It looks like a DCEVM problem in instrumentation.redefineClasses

It's unpredictable, it can even break when redefining all the same classess non-changed and on first hotswap (but it's rare). Has to be a race within DCEVM. Nothing interesting in -verbose logs

[75.708s][info   ][redefine,class,load] redefined name=SomeClass, count=0 (avail_mem=1829032K)
[75.708s][info   ][redefine,class,load] redefined name=SomeOtherClass, count=0 (avail_mem=1829032K)
(repeat for all of my classess including the listener)
[75.849s][info   ][redefine,class,timer] vm_op: all=77  prologue=77  doit=0
[75.849s][info   ][redefine,class,timer] redefine_single_class: phase1=0  phase2=0
ninja- commented 5 years ago

I did run with the fullest debug now and there is one DCEVM entry that triggers only when it bugs out

2019/08/23 00:29:10 [153.881s][debug  ][redefine,class,nmethod            ] Found evol dependency of nmethod MyListener.onMySomeEvent((Lmy/event/class) compile_id=5666 on method java.lang.Object.<init>(()V)

not sure if that could help you find the bug :( Also running DCEVM with full java debug makes the race harder to trigger because JVM has more work to do, I guess.

https://github.com/HotswapProjects/openjdk-jdk11u-dcevm/blob/addcc7e43bbdf372e62f406bc79ea8c1daa71abe/src/hotspot/share/code/nmethod.cpp#L1928

The race may have something to do with the code cache? When the race triggers, it may not break all event executors but a single one and other will continue to work. I thought I'd confirm that by looking if always the bugged event executor is in compiled method list at jcmd pid Compiler.codelist, but it's not always the case...it may break and not even be in code cache. Hm. Also the bug can trigger the same if I run with JIT disabled (-Xint)

Also tried changing GC from Serial to CMS. Still can bug out.

Another pattern I found in debug logs is that there's a call to class resolve just before the first ClassCastException for given broken executor.

[debug  ][class,resolve           ] MyListener$GeneratedEventExecutor11$onCMD MyListener
[info   ][exceptions              ] Exception <a 'java/lang/ClassCastException'{0x00000000f3e5f348}: class MyListener cannot be cast to class MyListener  *MyListener and MyListener are in unnamed module of loader PluginClassloader @189aa67a)> (0x00000000f3e5f348) 
2019/08/23 01:37:26 thrown [../../src/hotspot/share/interpreter/interpreterRuntime.cpp, line 474]
skybber commented 5 years ago

Code cache is invalidated after redefinition.

skybber commented 5 years ago

GeneratedEventExecutor is syntetic class generated by ASM? If yes:

Otherwise, how many times is single class redefined when you update classes on disk? Probably you know thi -XX:+TraceRedefineClasses JVM arg.

ninja- commented 5 years ago

@skybber

If I run the same hotswaps with HotswapAgent on pure JVM (12) it's rock solid never bugs out. There's only one redefine per class hotswap. Even if it's not a bug in your code, you might be missing on a number of fixes from latest Hotspot versions, and I saw at least few commits related to redefine logic.

skybber commented 5 years ago

We did most of work on j8->j9 patch, therefore there is less work on j9->j11, since there are less changes between plain j9->j11. Would be great if you make some tests for you problem. I would look at it at next dcevm/hotswap iteration which we plan on autumn.

skybber commented 5 years ago

Could you make test case for this issue please ?