Together-Java / TJ-Bot

TJ-Bot is a Discord Bot used on the Together Java server. It is maintained by the community, anyone can contribute.
https://togetherjava.org
GNU General Public License v3.0
101 stars 86 forks source link

Bug/gpt response compact #994

Closed ankitsmt211 closed 6 months ago

ankitsmt211 commented 8 months ago

resolves #920

Note: There's no way to generate shorter responses, i could go down to really low using BRIEF as a keyword but that's very very short. Imo char limit shouldn't be priority. We can always paginate the response in embeds.

If we cross limit of 2k chars atm, AIResponseParser class will automatically cut it into multiple short messages as mentioned in #928 .

Reducing MAX_TOKEN would just lead to lost responses at times.

Bottom Line, when implementing "embeds" for rare responses that go over 4k limit we can either paginate or using gpt again on generated response by dropping more fillers or something otherwise most responses should very well fall under 4k limit.

marko-radosavljevic commented 8 months ago

Yeah, we don't want to restrict and limit model from every side, rendering it uesless.To starve gpt of oxygen, until it coughs up few sentences for us, and dies. We want to just gently steer it, in it's full power.

If it has a perfect long gude, that explains every step perfectly, with code examples.. that's aweosme!

We should obvously optimize it, some simplest answers do not require those bloated responses. But quality should be our priority, and then optimizing for UI./UX.

These were the tests I used to benchmark and optimize responses.

class ChatGptServiceTest {
    private static final Logger logger = LoggerFactory.getLogger(ChatGptServiceTest.class);
    private Config config;
    private ChatGptService chatGptService;

    @BeforeEach
    void setUp() {
        config = mock();
        when(config.getOpenaiApiKey()).thenReturn("your-api-key");
        chatGptService = new ChatGptService(config);
    }

    @Test
    void askToGenerateLongPoem() {
        Optional<String> response = chatGptService.ask("generate a very long poem");
        response.ifPresent(e -> logger.warn(e));
    }

    @Test
    void askHowToSetupJacksonLibraryWithExamples() {
        Optional<String> response = chatGptService.ask("How to setup Jackson library with examples");
        response.ifPresent(e -> logger.warn(e));
    }

    @Test
    void askDockerReverseProxyWithNginxGuide() {
        Optional<String> response = chatGptService.ask("Docker reverse proxy with nginx guide");
        response.ifPresent(e -> logger.warn(e));
    }

    @Test
    void askWhyDoesItTakeYouMoreThan10SeconsToAnswer() {
        Optional<String> response = chatGptService.ask("Working example of Command pattern in java, with all the classes required, explained in detail. Bonus points for UML diagrams.");
        response.ifPresent(e -> logger.warn(e));
    }
}

Can you run these, and post how long they took, and results. Just curious how it would all look with current UI/UX. (Since this is testing service directly, best to just ask bot these questions). Also curious if user would have to wait 2 minutes for an answer, and if that would look unintuitve/unfriendly for the user, because it's nor poperly communicated what is happening.

marko-radosavljevic commented 8 months ago

Regarding added context based on #question channel, so gpt knows it's java. I'm curious if it would backfire in other categories (for whatever reason), especially in other category.

Because of that 'on a Java Q&A discord server', what happens if someone asks question and writes 'answer in python'. Or what if question is obviusly python, because there is python code attached, and gpt tries to rewrite it as Java or bastardizes it. What if mentioned libraries and frameworks are clearly from python ecosystem, would it answer within that context, or it will try to Javthon it?

Make sure to test some edge cases in different categories, and use some previous real-world failulres from #questions in your testsuite. Also include some successful answers by gpt, to check if you can notice any regressions. Just to be sure that this new prompt is objectively better, and that it won't make some other aspects worse by accidenet. :relaxed:

ankitsmt211 commented 8 months ago

Can be improved with regards to length with a good prompt but it's not super consistent, will get back to this.

ankitsmt211 commented 8 months ago

The response is not really compact, it needs a bit of playing with different prompts length, don't really feel like doing that atm. I'm going to undo length related changes, only keep context related changes. Because earlier length seems to better than what i did here.

ankitsmt211 commented 8 months ago

These tests are not done more than a couple times, but seems to be relatively much better than original one.

Character count based on tests given by marko

with new prompt (3k token limit)

  1. 1375 chars (poem about java)
  2. 1782 chars
  3. 1595 chars
  4. 2032 chars

with new prompt plus changes(temperature) from @surajkumar (2k token limit)

  1. 1425 chars
  2. 1891 chars
  3. 1658 chars
  4. 1924 chars

with new prompt plus changes(temperature) from @surajkumar (3k token limit)

  1. 1323 chars
  2. 1622 chars
  3. 1841 chars
  4. 2134 chars

shorter responses and context is pretty solid.

with earlier one(3k token limit)

  1. 1687 chars (random poem)
  2. kept throwing error (response greater than 2k, which will try and split it)
  3. 1772 chars
  4. kept throwing error (response greater than 2k, which will try and split it)

relatively longer responses and context totally depends on user's question.

surajkumar commented 8 months ago

Can you add this to your PR please:

    /** The maximum number of tokens allowed for the generated answer */
    private static final int MAX_TOKENS = 2_000;

    /**
     * This parameter reduces the likelihood of the AI repeating itself. A higher frequency penalty
     * makes the model less likely to repeat the same lines verbatim. It helps in generating more
     * diverse and varied responses.
     */
    private static final double FREQUENCY_PENALTY = 0.5;

    /**
     * This parameter controls the randomness of the AI's responses. A higher temperature results in
     * more varied, unpredictable, and creative responses. Conversely, a lower temperature makes the
     * model's responses more deterministic and conservative.
     */
    private static final double TEMPERATURE = 0.8;

    /**
     * n: This parameter specifies the number of responses to generate for each prompt. If n is more
     * than 1, the AI will generate multiple different responses to the same prompt, each one being
     * a separate iteration based on the input.
     */
    private static final int MAX_NUMBER_OF_RESPONSES = 1;

These keen eyes will notices some changes to the values.

ankitsmt211 commented 8 months ago

Token, freq, temperature are already set in the code. Do you want me to give them seperate var names?

surajkumar commented 8 months ago

Token, freq, temperature are already set in the code. Do you want me to give them seperate var names?

Yeah only because there's no Java docs on the openai lib and looking them up is a bother imo. Also doing this removes the whole "magic number" aspect but more so it's for the docs. I was gonna do it in another PR but since you're already here...

I also upped the TEMPERATURE I think that might be interesting.

marko-radosavljevic commented 6 months ago

Merging on basis of 1 review in approval of the changes, and more than 7 days of inactivity afterwards. Thanks :heart: