Closed slavingia closed 3 weeks ago
I think one step before this would be to provide context to AI on why the tests failed. For this we would need to get the logs from Github and feed it to OpenAI for context. What do you think?
Yes that's a great idea.
Love this! I think we should have a TDD mode and a rush mode (current mode) for testing new features. TDD mode would be prompt --> tests --> feature whereas rush mode would be prompt --> feature --> tests.
In rush mode we'd only capture the final step which is testing.
The reason why I think this is worth perusing is because in theory TDD mode should enable the user to go through fewer prompt iterations when building a new feature with AI but I'm not sure how realistic that is. 😅
Regardless of doing TDD mode or not, I agree that we should have a self healing process similar to what the blog post suggests.
Right now there is a manual step (Update tests to fix) but soon we'll make it more automated.
Hi @slavingia! This PR marks the second phase of my work on fixing failed test cases. Please let me know what you think of it.
One challenge I've encountered is that including the test logs is pushing us closer to the token limit, especially with larger code changes. Let me know if you'd like to have a conversation about token management.
I think it's fine for now. We may want to allow users to put in their own API key if it's too expensive, or start charging in another way.
https://x.com/tdinh_me/status/1847828606069989649?s=46
Eventually long context will be solved generally and be cheap
I was hoping that merging the PR would solve this issue but seems like this issue persist.
Probably because I haven't updated the env vars yet
I hope this fixes it. This thing is stressing me out.
This is great news for UI testing. Imagine giving AI a url and ask it to test a functionality with different scenarios.
Yes!!!
Worth building a new kind of test in plain English that is validated by computer use
These could run on Shortest as a GitHub Action?
I agree! And we might not even need to rely on GitHub Workflow as much!
This is a very early version, but I was able to integrate Jenkins to run the tests for Shortest in the same way GitHub Actions does. I have to say, this is very promising.
I think this will enable us to develop the features we want, such as TDD, green builds before committing, UI testing, and many more.
I’ll work on getting this to a stable version and record a proper demo. For now, here’s how it looks - let me know what you think:
https://github.com/user-attachments/assets/ae23f9ce-d37f-4dd2-b03f-ef94eb4abb94
Awesome!!
I think doing Jenkins with the new text format and then using that to get to green builds and then using that to do TDD seems like the right path
Much easier for AI to write code if reading natural language tests, and more likely for humans to write code if they can write natural language too (imagine pasting text from a PM tool and it acts as your test!)
Exactly! As you mentioned before, with Jenkins we can have a self healing process where we hit AI with failed tests and update the files to get a passing build in a loop. We can set a threshold so that if the tests don't pass, we let the user know. we might even provide user with a prompt to use with Cursor to fix the issue. Just a thought!!
Closing so we can first focus on getting a new kind of natural language AI test to work, and then we can scale it to solve test-driven development, as that's more complicated.
TDD: https://en.wikipedia.org/wiki/Test-driven_development
Green build before merging