Closed florentine-doemges closed 1 year ago
How should the application handle API rate limits or quota restrictions for the GPT-4 and GPT-3.5 models? Should there be any built-in mechanisms to manage these limitations?
Are there any requirements for integrating KGS-1 with other parts of the application, such as the Agent and Task Management or File and Data Handling features?
There might be some requirements for integrating KGS-1 with other parts of the application, including Agent and Task Management or File and Data Handling features. While the specific requirements will depend on your overall application design and goals, here are some potential integration points to consider:
Agent and Task Management:
The generated text or code from GPT-4 might need to be processed or acted upon by autonomous agents. The task manager might need to prioritize or schedule GPT-4 generation tasks based on user-defined goals or system resource availability. File and Data Handling:
Users might want to save the generated text or code to a file, either locally or remotely (e.g., cloud storage). Users might want to append or overwrite existing files with the generated content. The application might need to read from files and use the content as input for the GPT-4 generation process. GPT-4 generation tasks might require input from various data sources, such as databases or APIs. To ensure a smooth integration of KGS-1 with other parts of the application, it's essential to have a clear understanding of the desired user flows and interactions between the components. This information will help inform the design of the application's architecture, interfaces, and data models. Additionally, it's crucial to establish a robust testing strategy to validate the integration points and ensure that the application functions as expected.
Write me a list of tests that proof that the acceptance criteria are fulfilled?
Basic Text Generation Test:
Input a simple text prompt and verify that the application generates a relevant and coherent output. Basic Code Generation Test:
Input a simple code prompt and verify that the application generates syntactically correct and functional code. Parameter Customization Test:
Modify output length, temperature, and other parameters, and verify that the generated output reflects the specified parameter changes. API Efficiency Test:
Monitor API calls and response times to ensure that the system utilizes the GPT-4 model efficiently and does not make unnecessary or redundant API calls. Error Handling Test:
Introduce various error scenarios, such as invalid prompts or API failures, and verify that the application handles these errors gracefully and provides meaningful feedback to the user. Model Switching Test:
Switch between the GPT-4 and GPT-3.5 models and verify that the application can generate text or code using both models without issues. Long Input/Output Test:
Test the application with very long input prompts and verify that the generated outputs are still coherent and adhere to the specified output length. Unusual Input Test:
Input a series of unusual or unexpected prompts and verify that the application can handle such inputs without crashing or generating nonsensical outputs. Integration Test with Agent and Task Management:
Test the interaction between the text or code generation feature and the Agent and Task Management components to ensure seamless integration. Integration Test with File and Data Handling:
Test the interaction between the text or code generation feature and the File and Data Handling components, such as saving generated outputs to a file or reading input data from files.
Scenario: Generate simple text Given a text prompt "tell me a joke" When the user generates text with GPT-4 Then the output should be a coherent and relevant joke
Scenario: Generate simple code Given a code prompt "create a function that adds two numbers in Kotlin" When the user generates code with GPT-4 Then the output should be syntactically correct and functional Kotlin code
Scenario: Customize parameters Given a text prompt "tell me a story" And the output length is set to 100 words And the temperature is set to 0.8 When the user generates text with GPT-4 Then the output should be a 100-word story with a creativity level reflecting the temperature setting
Scenario: Ensure API efficiency Given a text prompt "tell me a fact about space" When the user generates text with GPT-4 Then the number of API calls and response times should be within acceptable limits
Scenario: Switch between models Given a text prompt "tell me a fact about history" When the user generates text with GPT-4 And the user switches to GPT-3.5 Then the application should generate text with both models without issues
Scenario: Generate simple text Given a text prompt "tell me a joke" When the user generates text with GPT-4 Then the output should be a coherent and relevant joke
import assertk.assertThat
import assertk.assertions.isEqualTo
import io.mockk.every
import io.mockk.mockk
import kotlinx.coroutines.runBlocking
import org.awaitility.Awaitility.await
import org.jbehave.core.annotations.Given
import org.jbehave.core.annotations.Then
import org.jbehave.core.annotations.When
import org.junit.jupiter.api.BeforeEach
import org.junit.jupiter.api.Test
import org.junit.jupiter.api.extension.ExtendWith
import org.junit.jupiter.api.extension.RegisterExtension
import org.junit.jupiter.api.extension.ExtensionContext
import org.junit.jupiter.api.extension.TestWatcher
import java.util.concurrent.TimeUnit
import kotlin.random.Random
import kotlin.test.assertNotNull
import com.appmattus.kotlinfixture.KotlinFixture
class GenerateSimpleTextTest {
data class Gpt4Request(val prompt: String)
data class Gpt4Response(val generatedText: String)
interface Gpt4Service {
suspend fun generateText(request: Gpt4Request): Gpt4Response
}
lateinit var gpt4Service: Gpt4Service
lateinit var prompt: String
lateinit var generatedText: String
lateinit var fixture: KotlinFixture
@BeforeEach
fun setUp() {
gpt4Service = mockk()
fixture = KotlinFixture()
}
@Given("a text prompt \"tell me a joke\"")
fun givenTextPrompt() {
prompt = "tell me a joke"
}
@When("the user generates text with GPT-4")
fun whenGenerateTextWithGpt4() {
runBlocking {
every { gpt4Service.generateText(Gpt4Request(prompt)) } returns Gpt4Response(fixture())
val response = gpt4Service.generateText(Gpt4Request(prompt))
generatedText = response.generatedText
}
}
@Then("the output should be a coherent and relevant joke")
fun thenOutputShouldBeCoherentAndRelevantJoke() {
assertNotNull(generatedText)
assertThat(generatedText).isNotEmpty()
// Check if the generated text is coherent and relevant
// This is just an example, you may need to adjust the criteria according to your requirements
val coherent = generatedText.split(" ").size >= 5
val relevant = generatedText.contains("joke", true) || generatedText.contains("laugh", true)
assertThat(coherent && relevant).isEqualTo(true)
// You can use Awaitility if you need to wait for a certain condition to be true
await().atMost(5, TimeUnit.SECONDS).untilAsserted {
assertThat(generatedText).isNotEmpty()
assertThat(coherent && relevant).isEqualTo(true)
}
}
companion object {
@JvmField
@RegisterExtension
val testListener = object : TestWatcher {
override fun testFailed(context: ExtensionContext?, cause: Throwable?) {
println("Test ${context?.displayName} failed with ${cause?.message}")
}
override fun testSuccessful(context: ExtensionContext?) {
println("Test ${context?.displayName} succeeded")
}
}
}
@Test
fun `Generate simple text`() {
givenTextPrompt()
whenGenerateTextWithGpt4()
thenOutputShouldBeCoherentAndRelevantJoke()
}
}
Scenario: Generate simple code Given a code prompt "create a function that adds two numbers in Kotlin" When the user generates code with GPT-4 Then the output should be syntactically correct and functional Kotlin code
import assertk.assertThat
import assertk.assertions.isEqualTo
import io.mockk.every
import io.mockk.mockk
import kotlinx.coroutines.delay
import kotlinx.coroutines.runBlocking
import org.appmatus.fixture.kotlinfixture.Fixture
import org.awaitility.Awaitility.await
import org.jbehave.core.annotations.Given
import org.jbehave.core.annotations.Then
import org.jbehave.core.annotations.When
import org.jbehave.core.steps.Steps
import org.junit.jupiter.api.Test
import org.junit.jupiter.api.extension.ExtendWith
import org.junit.jupiter.api.extension.ExtensionContext
import org.junit.jupiter.api.extension.ParameterContext
import org.junit.jupiter.api.extension.ParameterResolver
import java.util.concurrent.TimeUnit
import kotlin.script.experimental.api.EvaluationResult
import kotlin.script.experimental.api.ScriptCompilationConfiguration
import kotlin.script.experimental.api.defaultJvmScriptingHostConfiguration
import kotlin.script.experimental.host.toScriptSource
import kotlin.script.experimental.jvm.evalWithTemplate
import kotlin.script.experimental.jvm.util.classpathFromClass
import kotlin.script.experimental.jvm.util.classpathFromClassloader
import kotlin.script.experimental.jvmhost.BasicJvmScriptingHost
@ExtendWith(JBehaveExtension::class)
class GPT4CodeGeneratorTest : Steps() {
private val gpt4 = mockk<GPT4>()
private lateinit var codePrompt: String
private lateinit var generatedCode: String
@Given("a code prompt \"\$codePrompt\"")
fun givenCodePrompt(codePrompt: String) {
this.codePrompt = codePrompt
}
@When("the user generates code with GPT-4")
fun whenUserGeneratesCode() {
every { gpt4.generateCode(codePrompt) } returns Fixture().apply {
"""
fun add(a: Int, b: Int): Int {
return a + b
}
""".trimIndent()
}
generatedCode = gpt4.generateCode(codePrompt)
}
@Then("the output should be syntactically correct and functional Kotlin code")
fun thenOutputShouldBeCorrectKotlinCode() {
val compilationConfiguration = ScriptCompilationConfiguration {
jvm {
dependenciesFromClassContext(
GPT4CodeGeneratorTest::class,
wholeClasspath = true
)
}
}
val evaluation = runBlocking {
BasicJvmScriptingHost().evalWithTemplate(
generatedCode.toScriptSource(),
compilationConfiguration,
defaultJvmScriptingHostConfiguration
)
}
await().atMost(10, TimeUnit.SECONDS).untilAsserted {
assertThat(evaluation).isInstanceOf(EvaluationResult::class.java)
assertThat(evaluation.returnValue).isNotNull()
}
}
}
interface GPT4 {
fun generateCode(prompt: String): String
}
class JBehaveExtension : ParameterResolver {
override fun supportsParameter(parameterContext: ParameterContext?, extensionContext: ExtensionContext?): Boolean {
return parameterContext?.parameter?.type == GPT4CodeGeneratorTest::class.java
}
override fun resolveParameter(parameterContext: ParameterContext?, extensionContext: ExtensionContext?): Any {
return GPT4CodeGeneratorTest()
}
}
@Test
fun run() {
val stepsFactory = JBehaveStepsFactory(listOf(GPT4CodeGeneratorTest()))
JBehaveRunner().apply {
stepsFactory = this@GPT4CodeGeneratorTest.stepsFactory
}.run(listOf("classpath:/features/generate_simple_code.feature"))
}
Scenario: Customize parameters Given a text prompt "tell me a story" And the output length is set to 100 words And the temperature is set to 0.8 When the user generates text with GPT-4 Then the output should be a 100-word story with a creativity level reflecting the temperature setting
import assertk.assertThat
import assertk.assertions.isEqualTo
import assertk.assertions.isNotNull
import io.mockk.every
import io.mockk.mockk
import kotlinx.coroutines.runBlocking
import org.awaitility.Awaitility.await
import org.jbehave.core.annotations.Given
import org.jbehave.core.annotations.Then
import org.jbehave.core.annotations.When
import org.jbehave.core.junit.JUnitStories
import org.jbehave.core.steps.InjectableStepsFactory
import org.jbehave.core.steps.InstanceStepsFactory
import org.junit.Test
import pl.appmatus.fixture.Fixture
import pl.appmatus.fixture.kotlinFixture
import java.util.concurrent.TimeUnit
class CustomizeParametersTest : JUnitStories() {
private lateinit var gpt4: GPT4
private lateinit var prompt: String
private lateinit var outputLength: Int
private lateinit var temperature: Double
private lateinit var result: String
override fun stepsFactory(): InjectableStepsFactory {
return InstanceStepsFactory(configuration(), this)
}
@Given("a text prompt \"\$prompt\"")
fun givenATextPrompt(prompt: String) {
this.prompt = prompt
}
@Given("the output length is set to \$length words")
fun givenOutputLengthIsSetTo(length: Int) {
this.outputLength = length
}
@Given("the temperature is set to \$temperature")
fun givenTemperatureIsSetTo(temperature: Double) {
this.temperature = temperature
}
@When("the user generates text with GPT-4")
fun whenUserGeneratesTextWithGPT4() {
gpt4 = mockk<GPT4>()
val fixture: Fixture = kotlinFixture {
property("length", outputLength)
property("temperature", temperature)
}
every { runBlocking { gpt4.generateStory(prompt, outputLength, temperature) } } returns fixture.new<Story>().text
runBlocking {
result = gpt4.generateStory(prompt, outputLength, temperature)
}
}
@Then("the output should be a \$length-word story with a creativity level reflecting the temperature setting")
fun thenTheOutputShouldBeAStoryWithCreativityLevel(length: Int) {
await().atMost(10, TimeUnit.SECONDS).untilAsserted {
assertThat(result).isNotNull()
assertThat(result.split(" ").size).isEqualTo(length)
// Add any additional checks for creativity level if necessary
}
}
@Test
override fun run() {
super.run()
}
// Classes for GPT-4 and Story
class GPT4 {
suspend fun generateStory(prompt: String, length: Int, temperature: Double): String {
// Implementation here
return ""
}
}
data class Story(val text: String, val length: Int, val temperature: Double)
}
Scenario: Ensure API efficiency Given a text prompt "tell me a fact about space" When the user generates text with GPT-4 Then the number of API calls and response times should be within acceptable limits
import assertk.assertThat
import assertk.assertions.isEqualTo
import assertk.assertions.isLessThanOrEqualTo
import io.mockk.every
import io.mockk.mockk
import kotlinx.coroutines.delay
import org.appmattus.fixturefixture.Fixture
import org.appmattus.fixturefixture.kotlin.FixtureKotlin
import org.awaitility.Awaitility.await
import org.jbehave.core.Embeddable
import org.jbehave.core.configuration.Configuration
import org.jbehave.core.configuration.MostUsefulConfiguration
import org.jbehave.core.io.CodeLocations
import org.jbehave.core.io.LoadFromClasspath
import org.jbehave.core.junit.JUnitStories
import org.jbehave.core.reporters.Format
import org.jbehave.core.reporters.StoryReporterBuilder
import org.jbehave.core.steps.InjectableStepsFactory
import org.jbehave.core.steps.InstanceStepsFactory
import org.junit.Test
import java.time.Duration
import java.util.concurrent.TimeUnit
class Gpt4ApiEfficiencyTest : JUnitStories() {
override fun configuration(): Configuration =
MostUsefulConfiguration()
.useStoryLoader(LoadFromClasspath(this.javaClass))
.useStoryReporterBuilder(
StoryReporterBuilder()
.withCodeLocation(CodeLocations.codeLocationFromClass(this.javaClass))
.withFormats(Format.CONSOLE, Format.TXT, Format.HTML)
)
override fun stepsFactory(): InjectableStepsFactory =
InstanceStepsFactory(configuration(), Gpt4ApiEfficiencySteps())
class Gpt4ApiEfficiencySteps {
private val fixture: Fixture = FixtureKotlin()
private val apiService = mockk<Gpt4ApiService>()
private val acceptableApiCalls = 5
private val acceptableResponseTime = Duration.ofSeconds(3)
init {
every { apiService.generateText(any()) } answers {
delay(fixture<Long> { it.between(1, 3000) })
"Generated text response"
}
}
@Test
fun `ensure API efficiency`() {
val prompt = "tell me a fact about space"
var apiCalls = 0
var totalResponseTime = Duration.ZERO
repeat(acceptableApiCalls) {
val startTime = System.currentTimeMillis()
apiService.generateText(prompt)
val endTime = System.currentTimeMillis()
totalResponseTime += Duration.ofMillis(endTime - startTime)
apiCalls++
}
assertThat(apiCalls).isEqualTo(acceptableApiCalls)
assertThat(totalResponseTime).isLessThanOrEqualTo(acceptableResponseTime)
}
}
}
Scenario: Switch between models Given a text prompt "tell me a fact about history" When the user generates text with GPT-4 And the user switches to GPT-3.5 Then the application should generate text with both models without issues
import assertk.assertThat
import assertk.assertions.isEqualTo
import io.mockk.every
import io.mockk.mockk
import io.mockk.verify
import kotlinx.coroutines.runBlocking
import org.awaitility.Awaitility.await
import org.jbehave.core.annotations.Given
import org.jbehave.core.annotations.Then
import org.jbehave.core.annotations.When
import org.jbehave.core.junit.JUnitStory
import org.junit.jupiter.api.Test
import appmatus.com.kotlinfixture.fixture
import java.util.concurrent.TimeUnit
class SwitchBetweenModelsTest : JUnitStory() {
private val textPrompt = "tell me a fact about history"
private lateinit var gpt4: TextGenerator
private lateinit var gpt3_5: TextGenerator
private lateinit var resultGpt4: String
private lateinit var resultGpt3_5: String
@Given("a text prompt \"\$prompt\"")
fun givenTextPrompt(prompt: String) {
gpt4 = mockk()
gpt3_5 = mockk()
}
@When("the user generates text with GPT-4")
fun whenUserGeneratesTextWithGpt4() {
every { runBlocking { gpt4.generateText(textPrompt) } } returns "GPT-4 generated fact about history."
resultGpt4 = runBlocking { gpt4.generateText(textPrompt) }
}
@When("the user switches to GPT-3.5")
fun whenUserSwitchesToGpt3_5() {
every { runBlocking { gpt3_5.generateText(textPrompt) } } returns "GPT-3.5 generated fact about history."
resultGpt3_5 = runBlocking { gpt3_5.generateText(textPrompt) }
}
@Then("the application should generate text with both models without issues")
fun thenTheApplicationShouldGenerateTextWithBothModelsWithoutIssues() {
await().atMost(5, TimeUnit.SECONDS).untilAsserted {
verify {
runBlocking { gpt4.generateText(textPrompt) }
runBlocking { gpt3_5.generateText(textPrompt) }
}
assertThat(resultGpt4).isEqualTo("GPT-4 generated fact about history.")
assertThat(resultGpt3_5).isEqualTo("GPT-3.5 generated fact about history.")
}
}
// Assuming a TextGenerator interface is implemented by both GPT-4 and GPT-3.5 models
interface TextGenerator {
suspend fun generateText(prompt: String): String
}
@Test
fun runScenario() {
val steps = listOf(this)
val storyPath = storyPath().replace("\\", "/")
run(storyPath, steps)
}
}
Tests are green on my machine.
KGS-1 As a user, I want to generate high-quality text and code with GPT-4
Description: The application should provide an easy-to-use interface for users to input their text or code prompts and get generated outputs using the GPT-4 model. The users should be able to specify the output length, temperature, and other relevant parameters to control the creativity and quality of the generated text or code. The system should ensure that the GPT-4 model is utilized efficiently and safely, with appropriate API usage and error handling.
To accommodate both GPT-4 and GPT-3.5, a modular architecture with a common class for language models should be implemented. This will enable the application to easily switch between the two models or support additional models in the future, minimizing code duplication.
Acceptance Criteria:
Key Classes:
By implementing these classes and fulfilling the acceptance criteria, KGS-1 can be successfully completed, allowing users to generate high-quality text and code with GPT-4 and providing support for GPT-3.5 as well.