Build feature/update codebase to get tests to pass (TDD!)

slavingia commented 1 month ago

TDD: https://en.wikipedia.org/wiki/Test-driven_development

Green build before merging

m2rads commented 1 month ago

I think one step before this would be to provide context to AI on why the tests failed. For this we would need to get the logs from Github and feed it to OpenAI for context. What do you think?

slavingia commented 1 month ago

Yes that's a great idea.

slavingia commented 1 month ago

Inspiration: https://x.com/codeinthehole/status/1845541873651274144?s=46

m2rads commented 1 month ago

Love this! I think we should have a TDD mode and a rush mode (current mode) for testing new features. TDD mode would be prompt --> tests --> feature whereas rush mode would be prompt --> feature --> tests.

In rush mode we'd only capture the final step which is testing.

The reason why I think this is worth perusing is because in theory TDD mode should enable the user to go through fewer prompt iterations when building a new feature with AI but I'm not sure how realistic that is. 😅

m2rads commented 1 month ago

Regardless of doing TDD mode or not, I agree that we should have a self healing process similar to what the blog post suggests.

Right now there is a manual step (Update tests to fix) but soon we'll make it more automated.

m2rads commented 1 month ago

Hi @slavingia! This PR marks the second phase of my work on fixing failed test cases. Please let me know what you think of it.

One challenge I've encountered is that including the test logs is pushing us closer to the token limit, especially with larger code changes. Let me know if you'd like to have a conversation about token management.

slavingia commented 1 month ago

I think it's fine for now. We may want to allow users to put in their own API key if it's too expensive, or start charging in another way.

https://x.com/tdinh_me/status/1847828606069989649?s=46

Eventually long context will be solved generally and be cheap

m2rads commented 1 month ago

I was hoping that merging the PR would solve this issue but seems like this issue persist.

slavingia commented 1 month ago

Probably because I haven't updated the env vars yet

m2rads commented 1 month ago

I hope this fixes it. This thing is stressing me out.

m2rads commented 1 month ago

This is great news for UI testing. Imagine giving AI a url and ask it to test a functionality with different scenarios.

https://x.com/anthropicai/status/1848742740420341988?s=46

slavingia commented 1 month ago

Yes!!!

slavingia commented 1 month ago

Worth building a new kind of test in plain English that is validated by computer use

These could run on Shortest as a GitHub Action?

m2rads commented 1 month ago

I agree! And we might not even need to rely on GitHub Workflow as much!

This is a very early version, but I was able to integrate Jenkins to run the tests for Shortest in the same way GitHub Actions does. I have to say, this is very promising.

I think this will enable us to develop the features we want, such as TDD, green builds before committing, UI testing, and many more.

I’ll work on getting this to a stable version and record a proper demo. For now, here’s how it looks - let me know what you think:

https://github.com/user-attachments/assets/ae23f9ce-d37f-4dd2-b03f-ef94eb4abb94

slavingia commented 1 month ago

Awesome!!

slavingia commented 1 month ago

I think doing Jenkins with the new text format and then using that to get to green builds and then using that to do TDD seems like the right path

Much easier for AI to write code if reading natural language tests, and more likely for humans to write code if they can write natural language too (imagine pasting text from a PM tool and it acts as your test!)

m2rads commented 4 weeks ago

Exactly! As you mentioned before, with Jenkins we can have a self healing process where we hit AI with failed tests and update the files to get a passing build in a loop. We can set a threshold so that if the tests don't pass, we let the user know. we might even provide user with a prompt to use with Cursor to fix the issue. Just a thought!!

slavingia commented 3 weeks ago

Closing so we can first focus on getting a new kind of natural language AI test to work, and then we can scale it to solve test-driven development, as that's more complicated.

anti-work / shortest

Build feature/update codebase to get tests to pass (TDD!) #37