Ability to switch off making live calls

edeandrea commented 10 months ago

Since integrating with OpenAI costs real money, this introduces a way that I can "turn off" the integration. Maybe in an environment instead of making a "real" call I serve a default response.

This capability will certainly be extended further downstream (see https://github.com/quarkiverse/quarkus-langchain4j/issues/130), but the capability needs to be in the builder methods here, so this at least needs to be here so that it can be extended.

The design I took here is to try to keep the client from having to know anything about what to do in the case it is turned off. The case is treated simply like any client exception, which bubbles up as an OpenAiHttpException with a message indicating that the request was disabled.

I also added a bunch of test cases/classes which mirror the existing tests but enforce the fact that client is disabled. I may have gone overboard with the tests, but I thought better to be safe.

I'm happy to discuss and make any changes or alter naming conventions.

edeandrea commented 10 months ago

The tests fail because the OPENAI_API_KEY secret is not passed into builds performed by pull requests.

from https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions#using-secrets-in-a-workflow:

With the exception of GITHUB_TOKEN, secrets are not passed to the runner when a workflow is triggered from a forked repository.

langchain4j commented 10 months ago

HI @edeandrea thanks a lot! I like the idea of mocking/stubbing the interaction with the LLM, but I am not entirely sure that this client should be aware of this functionality. LangChain4j has lots of integrations with various LLM providers, I guess eventually users will want this feature for every model provider and it would be cumbersome to implement the same logic for each and every existing and new integration.

I am wondering (just thinking it out loud), since all models (e.g. OpenAiChatModel, OllamaChatModel, etc) extend the same interface (ChatLanguageModel) and are interchangeable, we could create in LangChain4j something like MockedChatModel which user could use instead of a real one depending on the environment or his/her custom logic. We could even support mockito-like config style:

ChatLanguageModel model = MockedChatModel.builder()
    .when(userMessage -> userMessage.contains("hello")).thenReturn("Hello, how can I help you today?")
    .when(userMessage -> userMessage.contains("bye")).thenReturn("See you later, have a great day!")
    .otherwiseReturn("This is a default answer")
.build()

This way it could be used instead of any model provider, not only OpenAI. User could have MockedChatModel as a bean in his app configuration for a dev/test profile and a real OpenAiChatModel for a prod profile.

Would love to hear your thought on this.

edeandrea commented 10 months ago

Hi there! Thanks for the note!

So, I did have the same thought as you with regards to

LangChain4j has lots of integrations with various LLM providers, I guess eventually users will want this feature for every model provider and it would be cumbersome to implement the same logic for each and every existing and new integration.

But once I started thinking down that route my thought is that this "functionality" or capability isn't a property of the model. It's really a property of the communication mechanism.

Because of that, I was also trying to shield Langchain from having to know anything about "what to do" in the situation. Leave it up to the caller to handle it. I think it would be hard (if not impossible) for the library (or even extensions of the library - like spring boot or quarkus) to be able to predict all the potential use cases. These libraries create the models as beans and inject them into the application. At that point in time the libraries wouldn't know how the library is being used. Having a blanket "don't make the actual request and let the caller decide what to do in that moment" I thought was a better approach.

In Quarkus/Spring Boot, for example, the application would couple it with fault tolerance (fallbacks/etc) which caught the exception and did something with it. Other consumers could try/catch or something else around the point in which they are calling it.

Now, for testing purposes, a fully implemented mocking framework might be really valuable. Currently we use wiremock to stub the interactions, but that unfortunately isn't a trick you can use when deploying an application into a "real" environment where perhaps you don't want to make real calls to a provider.

Just my thought process.

edeandrea commented 10 months ago

I'll also add that I'm still new to this library, as well as langchain4j, so I may not fully understand all the implementation details (yet - I'm getting there!). I've just learned that each model class actually contains its own client instance (thanks @geoand !).

So my previous comment

But once I started thinking down that route my thought is that this "functionality" or capability isn't a property of the model. It's really a property of the communication mechanism.

Still somewhat holds true from a theoretical perspective, but because each model contains its own client then maybe this property does belong on the model itself.

Maybe adding a layer of inheritance that all of the models extend for things that might be common?

There are also things like EmbeddingModel, ImageModel, LanguageModel, and ModerationModel (as well as their streaming equivalents, where they exist).

langchain4j commented 10 months ago

Hi there! Thanks for the note!

So, I did have the same thought as you with regards to

LangChain4j has lots of integrations with various LLM providers, I guess eventually users will want this feature for every model provider and it would be cumbersome to implement the same logic for each and every existing and new integration.

But once I started thinking down that route my thought is that this "functionality" or capability isn't a property of the model. It's really a property of the communication mechanism.

It is not a property of the model (e.g. OpenAiChatModel), what I meant is that it is a completely new and separate type of model (mock). I am not sure I understand why this is a property of the communication mechanism, could you please explain? (I am not saying it is wrong, I genuinely trying to understand your point)

Because of that, I was also trying to shield Langchain from having to know anything about "what to do" in the situation. Leave it up to the caller to handle it. I think it would be hard (if not impossible) for the library (or even extensions of the library - like spring boot or quarkus) to be able to predict all the potential use cases. These libraries create the models as beans and inject them into the application. At that point in time the libraries wouldn't know how the library is being used. Having a blanket "don't make the actual request and let the caller decide what to do in that moment" I thought was a better approach.

LangChain4j does not need to know about this "disable live requests" functionality, there is a ChatLanguageModel interface in langchain4j-core module which anyone can implement and plug it into LangChain4j chains/ai services. One of such implementations can be ChatModelMock which we could provide out of the box, but this will not in any way change LangChain4j or make it "aware" of this functionality.

In Quarkus/Spring Boot, for example, the application would couple it with fault tolerance (fallbacks/etc) which caught the exception and did something with it. Other consumers could try/catch or something else around the point in which they are calling it.

Hm, is it common to use fault tolerance mechanisms for such cases?

Now, for testing purposes, a fully implemented mocking framework might be really valuable. Currently we use wiremock to stub the interactions, but that unfortunately isn't a trick you can use when deploying an application into a "real" environment where perhaps you don't want to make real calls to a provider.

Ah, so you want to use it not only during integration testing, but in "real" envs as well? I did not get this from beginning, sorry for confusion.

Now I am wondering, If one wants to be able to enable/disable live interaction via config properties, we could provide a proxy class like SwitchableChatModel that would wrap a delegate ChatLanguageModel instance and accept a boolean enabled and possibly String defaultResponse properties. This way you could enable/disable any model, not only OpenAI, and OpenAI client stays agnostic of this. I am not sure about this though, user would need to manage this properties himself... just random thoughts.

edeandrea commented 10 months ago

I appreciate the conversation!

It is not a property of the model (e.g. OpenAiChatModel), what I meant is that it is a completely new and separate type of model (mock). I am not sure I understand why this is a property of the communication mechanism, could you please explain? (I am not saying it is wrong, I genuinely trying to understand your point)

I don't think it can be a completely separate type of model. It needs to be the same model that is already being used and/or extended by other downstream libraries/frameworks (like Quarkus or Spring).

This goes to the other comment about not just using this during testing. There may be times where this is used in "real" envs. I'm already doing this today in an application and I've invented my own application-specific way of dealing with it. We are looking to enhance this application though and I'll have to repeat the same pattern. I feel like it makes sense for the behavior to be part of the library itself. We're also planning on extending its usage to other applications and will have to apply the same pattern.

https://github.com/quarkusio/quarkus-super-heroes/blob/06169835747731afdd2fea27c5aba878897e68cd/rest-narration/src/main/java/io/quarkus/sample/superheroes/narration/service/NarrationProcessor.java#L26-L33

Hm, is it common to use fault tolerance mechanisms for such cases?

I'm not sure how common it is for this particular use case, but for fault tolerance in general it is. The Quarkus extension recommends it (https://docs.quarkiverse.io/quarkus-langchain4j/dev/fault-tolerance.html).

With the approach I took (throwing an exception) in this case, fault tolerance (or try/catch in the caller) would be the way you would deal with it, if you wanted to use this functionality.

With an approach like

ChatLanguageModel model = MockedChatModel.builder()
    .when(userMessage -> userMessage.contains("hello")).thenReturn("Hello, how can I help you today?")
    .when(userMessage -> userMessage.contains("bye")).thenReturn("See you later, have a great day!")
    .otherwiseReturn("This is a default answer")
.build()

I'm not sure how you would define that in externalized configuration (like a properties file or environment variable) that was passed into an application. Something like that would require code modification if it was to be able to be used during application runtime (not just testing).

Ah, so you want to use it not only during integration testing, but in "real" envs as well? I did not get this from beginning, sorry for confusion.

Yes thats where I am going with this

Now I am wondering, If one wants to be able to enable/disable live interaction via config properties, we could provide a proxy class like SwitchableChatModel that would wrap a delegate ChatLanguageModel instance and accept a boolean enabled and possibly String defaultResponse properties. This way you could enable/disable any model, not only OpenAI, and OpenAI client stays agnostic of this. I am not sure about this though, user would need to manage this properties himself... just random thoughts.

Wouldn't downstream libraries then have to make changes to extend this model? Also, having a String defaultResponse property wouldn't really work for the other kinds of models (images/etc). Thats part of the reason I was trying to leave out having this library (or Langchain4j, or any other library which derives from it, now or in the future) from having to attempt to manage what the "rules" are (i.e. under this condition return that, or always return this response). Leave that to the application that is implementing the capability as it is specific to their needs, and potentially on a deployment-by-deployment basis in different environments.

geoand commented 10 months ago

I have a slightly different idea which I would like to share.

Essentially what I see coming up a lot from our side (i.e. a framework consuming the library and trying to provide additional functionality on top of it), is that if one needs to configure something on any LLM client (be it OpenAI, Azure or anything else), then the necessary properties need to be added to the builder of a model (like the ChatLanguageModel).

So what if we did something like:

public abstract static class Builder<T extends OpenAiClient, B extends Builder<T, B>> { // this is the builder inside OpenAiClient

  // add this field
  public Function<OpenAiClient.Builder<T, B>, OpenAiClient.Builder<T, B>> customizer; // generics are terrible here, but it's only meant for advanced integrations anyway

   // add these methods
   public B customizer(Function<OpenAiClient.Builder<T, B>, OpenAiClient.Builder<T, B>> customizer) {
        this.customizer = customizer;
        return (B) this;
   }

   protected B getEffectiveBuilder() {
         OpenAiClient.Builder<T, B> effectiveBuilder = this;
         if (customizer != null) {
             effectiveBuilder = customizer.apply(effectiveBuilder);
         }
         return (B) effectiveBuilder;
   }

}

and then in the implementations of the builder (in DefaultOpenAiClient and the Quarkus specific one) would simple do:

  public DefaultOpenAiClient build() {
      return new DefaultOpenAiClient(getEffectiveBuilder());
  }

The plus side here is that this allows advanced integrations to do whatever they want with the builder without affecting normal integrations at all, meaning that what @edeandrea is trying to do could easily be accomplished on the Quarkus side by providing the proper customizer.

What do you think?

edeandrea commented 10 months ago

I like the idea @geoand. Maybe a slightly different approach would be to be able to allow a consumer to register a callback (like a Consumer<OpenAiClient.Builder<T, B>> which would be called in the builder's build method (if set). It would eliminate the need to add a getEffectiveBuilder method.

The problem I'm starting to see now, though, after the several conversations in this thread is that doing this here in openai4j only adds this capability to things that use this OpenAI client.

What about the other kinds of models (EmbeddingModel, ImageModel, LanguageModel, and ModerationModel) that are in Langchain4J?

If it would be helpful I could try out a small proof of concept in Langchain4j.

geoand commented 10 months ago

(like a Consumer<OpenAiClient.Builder<T, B>> which would be called in the builder's build method (if set). It would eliminate the need to add a getEffectiveBuilder method.

The reason I proposed the function, is that theoretically an advanced integration might want to change the builder entirely

geoand commented 10 months ago

The problem I'm starting to see now, though, after the several conversations in this thread is that doing this here in openai4j only adds this capability to things that use this OpenAI client

That's a serious shortcoming indeed.

Maybe we could have each model have a way to customize (in a general way like I proposed above) the underlying client

edeandrea commented 10 months ago

The reason I proposed the function, is that theoretically an advanced integration might want to change the builder entirely

That makes sense! Return a completely different builder so long as it is still an extension of it. Maybe even create a base level builder interface?

That's a serious shortcoming indeed.

Maybe we could have each model have a way to customize (in a general way like I proposed above) the underlying client

That's what I'm thinking too the more I think about this.

geoand commented 10 months ago

builder so long as it is still an extension of it

Mocking should be easy with this

edeandrea commented 10 months ago

Should I whip up a small prototyle within Langchain4J to prove this out? Or do you want to run with it?

geoand commented 10 months ago

Go ahead 😃

edeandrea commented 10 months ago

How did I know that was going to be your answer :)

geoand commented 10 months ago

Because my TZ is +7 of yours 😄

edeandrea commented 10 months ago

So are you saying you're calling it quits for the day? Slacker.... :D

edeandrea commented 10 months ago

@geoand / @langchain4j I put together a prototype here: https://github.com/langchain4j/langchain4j/pull/531

edeandrea commented 10 months ago

I'm going to close this in favor of what's going on in https://github.com/langchain4j/langchain4j/pull/531

ai-for-java / openai4j

Ability to switch off making live calls #13