Kong / unirest-java

Unirest in Java: Simplified, lightweight HTTP client library.
http://kong.github.io/unirest-java/
MIT License
2.59k stars 593 forks source link

Make primary instance initialized lazily #407

Closed JarvisCraft closed 3 years ago

JarvisCraft commented 3 years ago

Description

This makes the primary instance initialization lazy so that it gets thread-safely initialized on first reqeuest.

CLAassistant commented 3 years ago

CLA assistant check
All committers have signed the CLA.

ryber commented 3 years ago

It's unclear to me when this would be necessary, unless you are running multiple class-loaders, static field initialization is already thread-safe (AFAIK). and if that was a case I'm not sure your AtomicBoolean would be in any better? can you point me to some documentation on when this would be needed?

JarvisCraft commented 3 years ago

It's unclear to me when this would be necessary, unless you are running multiple class-loaders, static field initialization is already thread-safe (AFAIK). and if that was a case I'm not sure your AtomicBoolean would be in any better? can you point me to some documentation on when this would be needed?

The current approach is thread-safe but it is not lazy. Thus even if a user wants to use instance-based approach (i.e. without using primary instance) the primary instance will still be initialized on first call to spawnInstance() (once Unirest class gets loaded).

In order to solve this, I've moved the primary instance into a separate class so that it gets lazily initialized on actual first request to it.

As for the AtomicBoolean -- its main purpose is simple: guard callers of shutDown from accidentally initializing the primary instance to just have it shut down.

In other words

I didn't mean to say that current implementation is not thread-safe (there may be some thread-unsafe spots in UnirestInstance itself but it is out of this PR's target), it even is with multiple class-loaders as I know.

I did mean to say that the reason for using a separate holder-class was to provide thread-safety with lazy initialization.

Real-life example

A server which uses plugin system to load extensions at runtime: there is an API-plugin (let's call it RestApi Plugin for simplicity) which shades Unirest in so that dependent plugins can use it without relocating it to their own packages. Yet each plugin is required to spawn their own UnirestInstance not to mess with other ones.

As a maintainer of RestApi Plugin I take the responsibility of providing the shaded Unirest classes with my plugin (so that these are available for dependent plugins). However, at RestApi plugin level I cannot guarantee that some developer won't accidentally use primary-instance. Thus I am required to call Unirest.shutDown() on shutdown but want it to be no-op if primary instance was not used by abyone. Yet, if only lazy-init holder was used without AtmomicBoolean guard then attempt to just call primaryInstance().shutDown(clearOptions) would first initialize it (if it has not been initialized yet) and only then shut it down which obviously is redundant.

ryber commented 3 years ago

Ah! Yes, thank you for the explanation. this is a good idea. A couple of things:

  1. There seems to be some checkstyle violation on ci
  2. Is there any way we could test this? I realize this might be tricky to test but maybe not impossible. Did you give any thought to that?
JarvisCraft commented 3 years ago
  1. There seems to be some checkstyle violation on ci

I've pushed c586a00 to resolve it;

  1. Is there any way we could test this? I realize this might be tricky to test but maybe not impossible. Did you give any thought to that?

As of this, I am not that sure. There are tools like JCStress (Maven Central) but these seem to fit more for low-level scenarios as it's hard to attach any monitors to the internal singleton to check its state.

As I see, the only possible spot in which I cannot be sure about correct ordering is scenario:

Thread A Thre ad B
Atomic CAS of flag Initialization of Primary.INSTANCE
Call to shutDown() Volatile write to flag

Thus there are twe following possible invariants (considering the fact that all threads are guaranteed to observe the INSTANCE in the correct state according to JMM):

  1. CAS failed even before INSTANCE was initialized and no shutdown happened
  2. CAS failed between INSTANCE initialization and volatile write and no shutdown happened
  3. CAS succeeded after volatile write thus the INSTANCE was shut down just after it

IMO, first two scenarios in which (for some reason) shut down was called before initialization do not lead to weird behaviour (or it is the caller's fault that he first managed to issues shut down and then use the instance from which he is not guarded with current implementation too). The last scenario is just correct and does not seem to produce any logical problems.

I actually am not that sure about correctness of all my ideas but at least I tried, haha

ryber commented 3 years ago

The change is failing the tests. This is because now, once the instance is shut down it's shut down forever. In the old code, while it's true that the UnirestInstance was created. It's configuration was lazy and the core Apache classes (and their various threadpools) were lazy. Calling "shutdown" on an instance shuts everything off and then puts the configuration back into a dormant state which then can be re-initilized.

The tests take advantage of this and shut down the instance after each test, so now only the first shutdown in all the suites works. I'd like to say this is just a weird thing in the tests but I've spoken with multiple folk who have taken advantage of this in real life. If we change the instance holder to allow for a null state I think we can avoid that. This would require some synchronized methods which we might want to avoid in a heavy traffic situation so theres a little dance here. I'm not saying this is the best answer, I'm open to improvements, but this does what we want I think AND passes all the tests.

We would then have Unirest delegate to the methods rather than looking at the instance itself.

/**
     * Internal holder class which allows lazy initialization of primary {@link UnirestInstance}.
     */
    private static final class Primary {

        /**
         * Instance of the Unirest runtime lazily initialized on startup.
         */
        private static UnirestInstance instance;

        private static UnirestInstance getInstance() {
            if(!exists()) {
                init();
            }
            return instance;
        }

        public static boolean exists() {
            return instance != null;
        }

        private static synchronized void init() {
            if(!exists()) {
                instance = new UnirestInstance(new Config());
            }
        }

        public static synchronized void shutDown(boolean clearOptions) {
            if(exists()){
                instance.shutDown(clearOptions);
                instance = null;
            }
        }

        public static boolean isRunning() {
            return exists() && instance.isRunning();
        }

       private Primary() {
            throw new AssertionError("Primary should never be constructed");
        }
    }
rnc commented 3 years ago
2. Is there any way we could test this? I realize this might be tricky to test but maybe not impossible. Did you give any thought to that?

Have you considered using https://byteman.jboss.org/ ?

ryber commented 3 years ago

One nice thing about using the null pattern would be it's easily tested. because we can call the exists() method.

JarvisCraft commented 3 years ago

I'm not saying this is the best answer, I'm open to improvements, but this does what we want I think AND passes all the tests.

This one may requires some changes (I will ltry to imlpement these) as because there are no means of synchronization between method calls, one thread can actually observe null even after the init().

I will try to implement a safer variant with support for switching between initialized and shut down states so that the tests pass.


Considering the problems which appear now while doing this, may I suggest moving the primary instance in 4.0 version into a separate Maven module such as unirest-java-global (or some different name)? I.e. keep UnirestInstance and all implementation classes in unirest-java module (also moving Unirest#spawnInstance(..) to UnirestInstance#spawn(..) static factory method) while having unirest-java-global contain the current Unirest with its singleton and stuff.

This would allow (as for the example below) not to rely on primary instance existence (if you don't use unirest-java-global -- then you don't have to worry about it) and yet provide the primary instance approach for those who need it (by providing it as a separate module).

This is OK from point of semver as the major version gets changes allowing backwards-incompatible changes and still it allows more implementation freedom from point of regular unirest-java module.

ryber commented 3 years ago

one thread can actually observe null even after the init().

Wouldn't the synchronized on the method stop that? Unless you had threads fighting to shut down and re-init the instance at the exact moment in which case 🤷 .

JarvisCraft commented 3 years ago

Wouldn't the synchronized on the method stop that?

It may but JMM is quite tricky thus I think abot a bit more specific approach.

Unless you had threads fighting to shut down and re-init the instance at the exact moment in which case .

Yes! That's the exact problem with trying to proof the correctness of the approach)

ryber commented 3 years ago

Just to be clear about one part of all this:

then attempt to just call primaryInstance().shutDown(clearOptions) would first initialize it (if it has not been initialized yet)

this is not true, a configuration exists, but it's not initialized, It's a few classes in memory thats true, but the real engine, the thing that we need to be aware of on application shutdown, are the Apache clients, in an uninitialized configuration those are two empty optionals. So there are no threadpools or anything else going on.

Shutting down only shuts them down if they exist, otherwise it's a noop. see https://github.com/Kong/unirest-java/blob/main/unirest/src/main/java/kong/unirest/Config.java#L715-L731

JarvisCraft commented 3 years ago

Just to be clear about one part of all this:

then attempt to just call primaryInstance().shutDown(clearOptions) would first initialize it (if it has not been initialized yet)

this is not true, a configuration exists, but it's not initialized, It's a few classes in memory thats true, but the real engine, the thing that we need to be aware of on application shutdown, are the Apache clients, in an uninitialized configuration those are two empty optionals. So there are no threadpools or anything else going on.

Shutting down only shuts them down if they exist, otherwise it's a noop. see https://github.com/Kong/unirest-java/blob/main/unirest/src/main/java/kong/unirest/Config.java#L715-L731

Hm, that's interesting. My bad that I haven't considered this initially.

Funnily, looking at the implementation details this means that the problems possibly appearing in singleton design above (the one with separate synchronized methods) also appear on Config implementation like: https://github.com/Kong/unirest-java/blob/785d6cc8fc69339ed557cddcdde5fb775ac155bf/unirest/src/main/java/kong/unirest/Config.java#L766-L771 first calling buildAsyncClient writing to asyncClient() and then invoking getFinalAsyncClient() which relies on this field which yet may have been updated to Optional.EMPTY concurrently.


From this perspective this PR may really provide more problems than profit (as you are right that current implementation is already lazy enough).

In this case, is there any chance that the suggestion from https://github.com/Kong/unirest-java/pull/407#issuecomment-863387971 makes its way into 4.0.0? Having this split performed, the UnirestInstance itself may be enhanced later to be enabled once created and shut forever once it's close()d.

ryber commented 3 years ago

4.0 Unirest doesn't use Apache, and there are no background monitors or anything. At least as of right now .... so it's even less of a concern there at the moment. Because there isn't anything to "start" or "shut down".

ryber commented 3 years ago

closing as stale