florentine-doemges / KogniSwarm

KogniSwarm: A Kotlin-based open-source project for developing autonomous AI applications using GPT-4. Contribute and shape the future of AI interaction.
Apache License 2.0
8 stars 1 forks source link

As a user, I want to generate high-quality text and code with GPT-4 #1

Closed florentine-doemges closed 1 year ago

florentine-doemges commented 1 year ago

KGS-1 As a user, I want to generate high-quality text and code with GPT-4

Description: The application should provide an easy-to-use interface for users to input their text or code prompts and get generated outputs using the GPT-4 model. The users should be able to specify the output length, temperature, and other relevant parameters to control the creativity and quality of the generated text or code. The system should ensure that the GPT-4 model is utilized efficiently and safely, with appropriate API usage and error handling.

To accommodate both GPT-4 and GPT-3.5, a modular architecture with a common class for language models should be implemented. This will enable the application to easily switch between the two models or support additional models in the future, minimizing code duplication.

Acceptance Criteria:

  1. User can input text or code prompts and receive generated outputs from the GPT-4 model.
  2. User can specify output length, temperature, and other relevant parameters.
  3. The system utilizes the GPT-4 model efficiently and safely.
  4. Appropriate error handling and API usage are implemented.
  5. Generated outputs are of high quality and meet user expectations.
  6. The application's architecture supports both GPT-4 and GPT-3.5, allowing for easy switching between models and extensibility to other models.

Key Classes:

  1. LanguageModel: A class that handles the interactions with both GPT-4 and GPT-3.5 APIs, including methods for setting parameters, generating text or code, and managing authentication.
  2. LanguageModelHandler: A class responsible for handling user interactions, receiving input from the user, invoking the LanguageModel to generate text or code, and displaying generated outputs. This class will manage any error messages or status updates.
  3. LanguageModelConfiguration: A class managing the global configuration settings for language models, such as API keys, endpoint URLs, and other settings.

By implementing these classes and fulfilling the acceptance criteria, KGS-1 can be successfully completed, allowing users to generate high-quality text and code with GPT-4 and providing support for GPT-3.5 as well.

florentine-doemges commented 1 year ago

How should the application handle API rate limits or quota restrictions for the GPT-4 and GPT-3.5 models? Should there be any built-in mechanisms to manage these limitations?

florentine-doemges commented 1 year ago

Are there any requirements for integrating KGS-1 with other parts of the application, such as the Agent and Task Management or File and Data Handling features?

There might be some requirements for integrating KGS-1 with other parts of the application, including Agent and Task Management or File and Data Handling features. While the specific requirements will depend on your overall application design and goals, here are some potential integration points to consider:

Agent and Task Management:

The generated text or code from GPT-4 might need to be processed or acted upon by autonomous agents. The task manager might need to prioritize or schedule GPT-4 generation tasks based on user-defined goals or system resource availability. File and Data Handling:

Users might want to save the generated text or code to a file, either locally or remotely (e.g., cloud storage). Users might want to append or overwrite existing files with the generated content. The application might need to read from files and use the content as input for the GPT-4 generation process. GPT-4 generation tasks might require input from various data sources, such as databases or APIs. To ensure a smooth integration of KGS-1 with other parts of the application, it's essential to have a clear understanding of the desired user flows and interactions between the components. This information will help inform the design of the application's architecture, interfaces, and data models. Additionally, it's crucial to establish a robust testing strategy to validate the integration points and ensure that the application functions as expected.

florentine-doemges commented 1 year ago

Write me a list of tests that proof that the acceptance criteria are fulfilled?

Basic Text Generation Test:

Input a simple text prompt and verify that the application generates a relevant and coherent output. Basic Code Generation Test:

Input a simple code prompt and verify that the application generates syntactically correct and functional code. Parameter Customization Test:

Modify output length, temperature, and other parameters, and verify that the generated output reflects the specified parameter changes. API Efficiency Test:

Monitor API calls and response times to ensure that the system utilizes the GPT-4 model efficiently and does not make unnecessary or redundant API calls. Error Handling Test:

Introduce various error scenarios, such as invalid prompts or API failures, and verify that the application handles these errors gracefully and provides meaningful feedback to the user. Model Switching Test:

Switch between the GPT-4 and GPT-3.5 models and verify that the application can generate text or code using both models without issues. Long Input/Output Test:

Test the application with very long input prompts and verify that the generated outputs are still coherent and adhere to the specified output length. Unusual Input Test:

Input a series of unusual or unexpected prompts and verify that the application can handle such inputs without crashing or generating nonsensical outputs. Integration Test with Agent and Task Management:

Test the interaction between the text or code generation feature and the Agent and Task Management components to ensure seamless integration. Integration Test with File and Data Handling:

Test the interaction between the text or code generation feature and the File and Data Handling components, such as saving generated outputs to a file or reading input data from files.

florentine-doemges commented 1 year ago

Scenario: Generate simple text Given a text prompt "tell me a joke" When the user generates text with GPT-4 Then the output should be a coherent and relevant joke

florentine-doemges commented 1 year ago

Scenario: Generate simple code Given a code prompt "create a function that adds two numbers in Kotlin" When the user generates code with GPT-4 Then the output should be syntactically correct and functional Kotlin code

florentine-doemges commented 1 year ago

Scenario: Customize parameters Given a text prompt "tell me a story" And the output length is set to 100 words And the temperature is set to 0.8 When the user generates text with GPT-4 Then the output should be a 100-word story with a creativity level reflecting the temperature setting

florentine-doemges commented 1 year ago

Scenario: Ensure API efficiency Given a text prompt "tell me a fact about space" When the user generates text with GPT-4 Then the number of API calls and response times should be within acceptable limits

florentine-doemges commented 1 year ago

Scenario: Switch between models Given a text prompt "tell me a fact about history" When the user generates text with GPT-4 And the user switches to GPT-3.5 Then the application should generate text with both models without issues

florentine-doemges commented 1 year ago

Scenario: Generate simple text Given a text prompt "tell me a joke" When the user generates text with GPT-4 Then the output should be a coherent and relevant joke

import assertk.assertThat
import assertk.assertions.isEqualTo
import io.mockk.every
import io.mockk.mockk
import kotlinx.coroutines.runBlocking
import org.awaitility.Awaitility.await
import org.jbehave.core.annotations.Given
import org.jbehave.core.annotations.Then
import org.jbehave.core.annotations.When
import org.junit.jupiter.api.BeforeEach
import org.junit.jupiter.api.Test
import org.junit.jupiter.api.extension.ExtendWith
import org.junit.jupiter.api.extension.RegisterExtension
import org.junit.jupiter.api.extension.ExtensionContext
import org.junit.jupiter.api.extension.TestWatcher
import java.util.concurrent.TimeUnit
import kotlin.random.Random
import kotlin.test.assertNotNull
import com.appmattus.kotlinfixture.KotlinFixture

class GenerateSimpleTextTest {

    data class Gpt4Request(val prompt: String)
    data class Gpt4Response(val generatedText: String)

    interface Gpt4Service {
        suspend fun generateText(request: Gpt4Request): Gpt4Response
    }

    lateinit var gpt4Service: Gpt4Service
    lateinit var prompt: String
    lateinit var generatedText: String
    lateinit var fixture: KotlinFixture

    @BeforeEach
    fun setUp() {
        gpt4Service = mockk()
        fixture = KotlinFixture()
    }

    @Given("a text prompt \"tell me a joke\"")
    fun givenTextPrompt() {
        prompt = "tell me a joke"
    }

    @When("the user generates text with GPT-4")
    fun whenGenerateTextWithGpt4() {
        runBlocking {
            every { gpt4Service.generateText(Gpt4Request(prompt)) } returns Gpt4Response(fixture())
            val response = gpt4Service.generateText(Gpt4Request(prompt))
            generatedText = response.generatedText
        }
    }

    @Then("the output should be a coherent and relevant joke")
    fun thenOutputShouldBeCoherentAndRelevantJoke() {
        assertNotNull(generatedText)
        assertThat(generatedText).isNotEmpty()

        // Check if the generated text is coherent and relevant
        // This is just an example, you may need to adjust the criteria according to your requirements
        val coherent = generatedText.split(" ").size >= 5
        val relevant = generatedText.contains("joke", true) || generatedText.contains("laugh", true)
        assertThat(coherent && relevant).isEqualTo(true)

        // You can use Awaitility if you need to wait for a certain condition to be true
        await().atMost(5, TimeUnit.SECONDS).untilAsserted {
            assertThat(generatedText).isNotEmpty()
            assertThat(coherent && relevant).isEqualTo(true)
        }
    }

    companion object {
        @JvmField
        @RegisterExtension
        val testListener = object : TestWatcher {
            override fun testFailed(context: ExtensionContext?, cause: Throwable?) {
                println("Test ${context?.displayName} failed with ${cause?.message}")
            }

            override fun testSuccessful(context: ExtensionContext?) {
                println("Test ${context?.displayName} succeeded")
            }
        }
    }

    @Test
    fun `Generate simple text`() {
        givenTextPrompt()
        whenGenerateTextWithGpt4()
        thenOutputShouldBeCoherentAndRelevantJoke()
    }
}
florentine-doemges commented 1 year ago

Scenario: Generate simple code Given a code prompt "create a function that adds two numbers in Kotlin" When the user generates code with GPT-4 Then the output should be syntactically correct and functional Kotlin code

import assertk.assertThat
import assertk.assertions.isEqualTo
import io.mockk.every
import io.mockk.mockk
import kotlinx.coroutines.delay
import kotlinx.coroutines.runBlocking
import org.appmatus.fixture.kotlinfixture.Fixture
import org.awaitility.Awaitility.await
import org.jbehave.core.annotations.Given
import org.jbehave.core.annotations.Then
import org.jbehave.core.annotations.When
import org.jbehave.core.steps.Steps
import org.junit.jupiter.api.Test
import org.junit.jupiter.api.extension.ExtendWith
import org.junit.jupiter.api.extension.ExtensionContext
import org.junit.jupiter.api.extension.ParameterContext
import org.junit.jupiter.api.extension.ParameterResolver
import java.util.concurrent.TimeUnit
import kotlin.script.experimental.api.EvaluationResult
import kotlin.script.experimental.api.ScriptCompilationConfiguration
import kotlin.script.experimental.api.defaultJvmScriptingHostConfiguration
import kotlin.script.experimental.host.toScriptSource
import kotlin.script.experimental.jvm.evalWithTemplate
import kotlin.script.experimental.jvm.util.classpathFromClass
import kotlin.script.experimental.jvm.util.classpathFromClassloader
import kotlin.script.experimental.jvmhost.BasicJvmScriptingHost

@ExtendWith(JBehaveExtension::class)
class GPT4CodeGeneratorTest : Steps() {

    private val gpt4 = mockk<GPT4>()
    private lateinit var codePrompt: String
    private lateinit var generatedCode: String

    @Given("a code prompt \"\$codePrompt\"")
    fun givenCodePrompt(codePrompt: String) {
        this.codePrompt = codePrompt
    }

    @When("the user generates code with GPT-4")
    fun whenUserGeneratesCode() {
        every { gpt4.generateCode(codePrompt) } returns Fixture().apply {
            """
            fun add(a: Int, b: Int): Int {
                return a + b
            }
            """.trimIndent()
        }
        generatedCode = gpt4.generateCode(codePrompt)
    }

    @Then("the output should be syntactically correct and functional Kotlin code")
    fun thenOutputShouldBeCorrectKotlinCode() {
        val compilationConfiguration = ScriptCompilationConfiguration {
            jvm {
                dependenciesFromClassContext(
                    GPT4CodeGeneratorTest::class,
                    wholeClasspath = true
                )
            }
        }

        val evaluation = runBlocking {
            BasicJvmScriptingHost().evalWithTemplate(
                generatedCode.toScriptSource(),
                compilationConfiguration,
                defaultJvmScriptingHostConfiguration
            )
        }

        await().atMost(10, TimeUnit.SECONDS).untilAsserted {
            assertThat(evaluation).isInstanceOf(EvaluationResult::class.java)
            assertThat(evaluation.returnValue).isNotNull()
        }
    }
}

interface GPT4 {
    fun generateCode(prompt: String): String
}

class JBehaveExtension : ParameterResolver {
    override fun supportsParameter(parameterContext: ParameterContext?, extensionContext: ExtensionContext?): Boolean {
        return parameterContext?.parameter?.type == GPT4CodeGeneratorTest::class.java
    }

    override fun resolveParameter(parameterContext: ParameterContext?, extensionContext: ExtensionContext?): Any {
        return GPT4CodeGeneratorTest()
    }
}

@Test
fun run() {
    val stepsFactory = JBehaveStepsFactory(listOf(GPT4CodeGeneratorTest()))

    JBehaveRunner().apply {
        stepsFactory = this@GPT4CodeGeneratorTest.stepsFactory
    }.run(listOf("classpath:/features/generate_simple_code.feature"))
}
florentine-doemges commented 1 year ago

Scenario: Customize parameters Given a text prompt "tell me a story" And the output length is set to 100 words And the temperature is set to 0.8 When the user generates text with GPT-4 Then the output should be a 100-word story with a creativity level reflecting the temperature setting

import assertk.assertThat
import assertk.assertions.isEqualTo
import assertk.assertions.isNotNull
import io.mockk.every
import io.mockk.mockk
import kotlinx.coroutines.runBlocking
import org.awaitility.Awaitility.await
import org.jbehave.core.annotations.Given
import org.jbehave.core.annotations.Then
import org.jbehave.core.annotations.When
import org.jbehave.core.junit.JUnitStories
import org.jbehave.core.steps.InjectableStepsFactory
import org.jbehave.core.steps.InstanceStepsFactory
import org.junit.Test
import pl.appmatus.fixture.Fixture
import pl.appmatus.fixture.kotlinFixture
import java.util.concurrent.TimeUnit

class CustomizeParametersTest : JUnitStories() {

    private lateinit var gpt4: GPT4
    private lateinit var prompt: String
    private lateinit var outputLength: Int
    private lateinit var temperature: Double
    private lateinit var result: String

    override fun stepsFactory(): InjectableStepsFactory {
        return InstanceStepsFactory(configuration(), this)
    }

    @Given("a text prompt \"\$prompt\"")
    fun givenATextPrompt(prompt: String) {
        this.prompt = prompt
    }

    @Given("the output length is set to \$length words")
    fun givenOutputLengthIsSetTo(length: Int) {
        this.outputLength = length
    }

    @Given("the temperature is set to \$temperature")
    fun givenTemperatureIsSetTo(temperature: Double) {
        this.temperature = temperature
    }

    @When("the user generates text with GPT-4")
    fun whenUserGeneratesTextWithGPT4() {
        gpt4 = mockk<GPT4>()

        val fixture: Fixture = kotlinFixture {
            property("length", outputLength)
            property("temperature", temperature)
        }

        every { runBlocking { gpt4.generateStory(prompt, outputLength, temperature) } } returns fixture.new<Story>().text

        runBlocking {
            result = gpt4.generateStory(prompt, outputLength, temperature)
        }
    }

    @Then("the output should be a \$length-word story with a creativity level reflecting the temperature setting")
    fun thenTheOutputShouldBeAStoryWithCreativityLevel(length: Int) {
        await().atMost(10, TimeUnit.SECONDS).untilAsserted {
            assertThat(result).isNotNull()
            assertThat(result.split(" ").size).isEqualTo(length)
            // Add any additional checks for creativity level if necessary
        }
    }

    @Test
    override fun run() {
        super.run()
    }

    // Classes for GPT-4 and Story
    class GPT4 {
        suspend fun generateStory(prompt: String, length: Int, temperature: Double): String {
            // Implementation here
            return ""
        }
    }

    data class Story(val text: String, val length: Int, val temperature: Double)
}
florentine-doemges commented 1 year ago

Scenario: Ensure API efficiency Given a text prompt "tell me a fact about space" When the user generates text with GPT-4 Then the number of API calls and response times should be within acceptable limits

import assertk.assertThat
import assertk.assertions.isEqualTo
import assertk.assertions.isLessThanOrEqualTo
import io.mockk.every
import io.mockk.mockk
import kotlinx.coroutines.delay
import org.appmattus.fixturefixture.Fixture
import org.appmattus.fixturefixture.kotlin.FixtureKotlin
import org.awaitility.Awaitility.await
import org.jbehave.core.Embeddable
import org.jbehave.core.configuration.Configuration
import org.jbehave.core.configuration.MostUsefulConfiguration
import org.jbehave.core.io.CodeLocations
import org.jbehave.core.io.LoadFromClasspath
import org.jbehave.core.junit.JUnitStories
import org.jbehave.core.reporters.Format
import org.jbehave.core.reporters.StoryReporterBuilder
import org.jbehave.core.steps.InjectableStepsFactory
import org.jbehave.core.steps.InstanceStepsFactory
import org.junit.Test
import java.time.Duration
import java.util.concurrent.TimeUnit

class Gpt4ApiEfficiencyTest : JUnitStories() {

    override fun configuration(): Configuration =
        MostUsefulConfiguration()
            .useStoryLoader(LoadFromClasspath(this.javaClass))
            .useStoryReporterBuilder(
                StoryReporterBuilder()
                    .withCodeLocation(CodeLocations.codeLocationFromClass(this.javaClass))
                    .withFormats(Format.CONSOLE, Format.TXT, Format.HTML)
            )

    override fun stepsFactory(): InjectableStepsFactory =
        InstanceStepsFactory(configuration(), Gpt4ApiEfficiencySteps())

    class Gpt4ApiEfficiencySteps {
        private val fixture: Fixture = FixtureKotlin()
        private val apiService = mockk<Gpt4ApiService>()
        private val acceptableApiCalls = 5
        private val acceptableResponseTime = Duration.ofSeconds(3)

        init {
            every { apiService.generateText(any()) } answers {
                delay(fixture<Long> { it.between(1, 3000) })
                "Generated text response"
            }
        }

        @Test
        fun `ensure API efficiency`() {
            val prompt = "tell me a fact about space"
            var apiCalls = 0
            var totalResponseTime = Duration.ZERO

            repeat(acceptableApiCalls) {
                val startTime = System.currentTimeMillis()
                apiService.generateText(prompt)
                val endTime = System.currentTimeMillis()

                totalResponseTime += Duration.ofMillis(endTime - startTime)
                apiCalls++
            }

            assertThat(apiCalls).isEqualTo(acceptableApiCalls)
            assertThat(totalResponseTime).isLessThanOrEqualTo(acceptableResponseTime)
        }
    }
}
florentine-doemges commented 1 year ago

Scenario: Switch between models Given a text prompt "tell me a fact about history" When the user generates text with GPT-4 And the user switches to GPT-3.5 Then the application should generate text with both models without issues

import assertk.assertThat
import assertk.assertions.isEqualTo
import io.mockk.every
import io.mockk.mockk
import io.mockk.verify
import kotlinx.coroutines.runBlocking
import org.awaitility.Awaitility.await
import org.jbehave.core.annotations.Given
import org.jbehave.core.annotations.Then
import org.jbehave.core.annotations.When
import org.jbehave.core.junit.JUnitStory
import org.junit.jupiter.api.Test
import appmatus.com.kotlinfixture.fixture
import java.util.concurrent.TimeUnit

class SwitchBetweenModelsTest : JUnitStory() {

    private val textPrompt = "tell me a fact about history"
    private lateinit var gpt4: TextGenerator
    private lateinit var gpt3_5: TextGenerator
    private lateinit var resultGpt4: String
    private lateinit var resultGpt3_5: String

    @Given("a text prompt \"\$prompt\"")
    fun givenTextPrompt(prompt: String) {
        gpt4 = mockk()
        gpt3_5 = mockk()
    }

    @When("the user generates text with GPT-4")
    fun whenUserGeneratesTextWithGpt4() {
        every { runBlocking { gpt4.generateText(textPrompt) } } returns "GPT-4 generated fact about history."
        resultGpt4 = runBlocking { gpt4.generateText(textPrompt) }
    }

    @When("the user switches to GPT-3.5")
    fun whenUserSwitchesToGpt3_5() {
        every { runBlocking { gpt3_5.generateText(textPrompt) } } returns "GPT-3.5 generated fact about history."
        resultGpt3_5 = runBlocking { gpt3_5.generateText(textPrompt) }
    }

    @Then("the application should generate text with both models without issues")
    fun thenTheApplicationShouldGenerateTextWithBothModelsWithoutIssues() {
        await().atMost(5, TimeUnit.SECONDS).untilAsserted {
            verify {
                runBlocking { gpt4.generateText(textPrompt) }
                runBlocking { gpt3_5.generateText(textPrompt) }
            }
            assertThat(resultGpt4).isEqualTo("GPT-4 generated fact about history.")
            assertThat(resultGpt3_5).isEqualTo("GPT-3.5 generated fact about history.")
        }
    }

    // Assuming a TextGenerator interface is implemented by both GPT-4 and GPT-3.5 models
    interface TextGenerator {
        suspend fun generateText(prompt: String): String
    }

    @Test
    fun runScenario() {
        val steps = listOf(this)
        val storyPath = storyPath().replace("\\", "/")
        run(storyPath, steps)
    }
}
florentine-doemges commented 1 year ago

Tests are green on my machine.