Open slavingia opened 1 month ago
Since this does end 2 end testing on URLs, should wait until the Vercel preview branch is deployed and use that URL + computer use to run specs.
Hi @slavingia! I am working on a test editor for writing ui test to use with computer use. Let me know what you think of this design?
I took inspiration from Data Driven Testing approach introduced in this article: https://spockframework.org/spock/docs/1.0/data_driven_testing.html
A couple benefits of this design: 1- It allows the user to provide necessary data for ai to interact with their web app 2- AI can let the user know what data it requires and the user can provide them. (This is a technical debt that hopefully can be fixed once we implement integrated testing with db)
Preferably, AI should populate some testing scenario and use should just provide test data. I will implement this in the next iteration.
This is my thought process:
I think with this approach we can develop a complete solution for Vercel. For example, imagine a PR opens:
Shortest run the unit test, once passed -> deploy to Vercel preview -> Shortest UI test, once passed -> promote to prod
Once we perfect this for Vercel, we can start building an ecosystem that supports different infrastructures and frameworks such as Replit, Heroku, and even AWS and etc.
Let me know what you think of this approach!
I think the testing framework should be a node package and the files should be committed to the repo, so it'll be a competitor to vitest. Then the shortest web should be able to 1) help write new ones and display them nicely like you have 2) consume the ones already in the repo.
Hi @slavingia , I was playing around with computer use and I was able to implement this feature for this issue. I still need to cleanup the ui design and implement some security features, but I was wondering what you think of this implementation? Please let me know.
https://github.com/user-attachments/assets/605eb3ae-d13b-4e0f-a16b-3357b8b0b4e2
So with this one we setup a virtual env to execute computer tool to interact with the browser and using vnc we stream the video to shortest. For more info please refer to this repo: https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo
Looks awesome! Still think the files should be within repo and pulled into the UI and shown as running either here and/or via GitHub actions
The only thing is that now Computer Use handles all the execution for UI tests. It eliminates the need for Github actions.
We can just give it a scenario and the AI executes it. The nice thing is that UI test scenarios can be generated by AI as well. This is the only feature that AI handles both the test case generation and execution. It almost feels like plug and play.
GitHub actions is still useful as it's what we use to deploy, so we'd want Shortest tests to go green before a deploy triggers.
Yes of course unit tests should run on Github actions.
I think I am slowly understanding you. Would this be close to what you are trying to achieve? Each module could have a .shortest.ts file. We can have a shortest.config file at the root dir that our app can use to execute the tests. Let me know if this is close.
Yep exactly! I think each folder and/or route should have one that takes care of all e2e testing and maybe we can remove any unit tests as ideally e2e should cover that all.
Awesome! Now the above image is taken from a testing framework called Spock. What I am interested in this is the human readable format and support for data driven tests that to me really applies to what we are trying to achieve. If you agree to use this format, I can create an npm package that could provide support for this syntax in ts/js roughly like this:
The good thing about it is that users can import a json file and dynamically run multiple iteration of these test cases with different data. What do you think about this approach?
Maybe we could integrate computer use inside the package so that the user could run the tests locally by executing a shortest test command (super early for this but eventually)?
We can also have code base profiling so that at runtime, shortest tells us how much code coverage we have. I already have a proof of concept for this for spring boot applications. I think I can do it for js/ts frameworks too.
Code coverage is tough since what counts? IMO AI will allow much better coverage that wasn't possible before with automated tests. For example "all buttons are rounded" could be a test
IMO two parts of e2e testing: checking database changes, and visual changes. So we can keep our package quite simple and agree would be nice to run locally.
One more part that a full testing package would need to support: emails, this would also help test webhooks/background jobs
(Ideally all emails in testing go to "inbox@shortest.com and we can validate their input to check that the right background jobs happened, as well as email text content and attachments)
Basic example by Claude of what a shortest test file could look like...
// tests/github/pull-requests.test.ts
import { RouteTestBuilder } from '@shortest/core';
import { seedDatabase, cleanDatabase } from '@shortest/helpers';
import { PrismaClient } from '@prisma/client';
// Type definitions for our routes
interface LoginParams {
email: string;
password: string;
}
interface GetPullRequestsParams {
state?: 'open' | 'closed' | 'all';
assignee?: string;
}
// Shared test data
const testUser = {
email: 'developer@company.com',
password: 'secure123',
githubUsername: 'devuser'
};
const testPRs = [
{
id: 1,
title: 'Feature: Add new API endpoint',
state: 'open',
assignee: 'devuser',
repository: 'main-app'
},
{
id: 2,
title: 'Fix: Memory leak in worker',
state: 'open',
assignee: 'devuser',
repository: 'workers'
}
];
// Test Suite
describe('GitHub Pull Request Flow', () => {
let authToken: string;
const prisma = new PrismaClient();
// Setup and teardown
beforeAll(async () => {
await cleanDatabase();
await seedDatabase({
users: [
{
...testUser,
hashedPassword: await bcrypt.hash(testUser.password, 10)
}
],
pullRequests: testPRs
});
});
afterAll(async () => {
await cleanDatabase();
await prisma.$disconnect();
});
// Login Test
const loginTest = new RouteTestBuilder<LoginParams>('/auth/login')
.test('Developer can login with correct credentials')
.when({
email: testUser.email,
password: testUser.password
})
.expect({
status: 200,
body: {
success: true,
user: {
email: testUser.email,
githubUsername: testUser.githubUsername
}
},
dbState: [{
collection: 'users',
query: { email: testUser.email },
data: {
lastLoginAt: expect.any(Date),
loginCount: expect.toBeGreaterThan(0)
}
}]
})
.after(async (response) => {
// Store auth token for subsequent requests
authToken = response.body.token;
});
// Pull Requests Test
const pullRequestsTest = new RouteTestBuilder<GetPullRequestsParams>('/api/pull-requests')
.test('Developer can view their assigned open pull requests')
.before(async () => {
// Ensure we're authenticated
expect(authToken).toBeDefined();
})
.when({
state: 'open',
assignee: testUser.githubUsername
})
.expect({
status: 200,
body: {
pullRequests: expect.arrayContaining([
{
id: testPRs[0].id,
title: testPRs[0].title,
state: 'open',
repository: testPRs[0].repository
},
{
id: testPRs[1].id,
title: testPRs[1].title,
state: 'open',
repository: testPRs[1].repository
}
]),
totalCount: 2
},
// Verify analytics event was logged
dbState: [{
collection: 'analytics_events',
query: {
type: 'PR_LIST_VIEWED',
userId: expect.any(String)
},
shouldExist: true
}]
});
// Additional test cases
const edgeCaseTests = new RouteTestBuilder<GetPullRequestsParams>('/api/pull-requests')
.test('Returns empty array when developer has no assigned PRs')
.before(async () => {
// Remove all PRs for this test
await prisma.pullRequest.deleteMany({
where: { assignee: testUser.githubUsername }
});
})
.when({
assignee: testUser.githubUsername
})
.expect({
status: 200,
body: {
pullRequests: [],
totalCount: 0
}
})
.test('Handles invalid authentication gracefully')
.before(async () => {
// Invalidate auth token
authToken = 'invalid-token';
})
.when({
state: 'open'
})
.expect({
status: 401,
body: {
error: 'Authentication required'
}
})
.test('Rate limiting prevents too many requests')
.before(async () => {
// Simulate hitting rate limit
for (let i = 0; i < 100; i++) {
await fetch('/api/pull-requests');
}
})
.when({
state: 'open'
})
.expect({
status: 429,
body: {
error: 'Too many requests',
retryAfter: expect.any(Number)
}
});
// Custom test helpers
async function expectWebhookCall(repository: string) {
return {
url: `https://api.github.com/repos/${repository}/pulls`,
headers: {
'Authorization': expect.stringContaining('token'),
'User-Agent': 'YourApp/1.0'
}
};
}
// Run the full suite
export async function runTests() {
try {
await loginTest.run();
await pullRequestsTest.run();
await edgeCaseTests.run();
console.log('All tests passed! 🎉');
} catch (error) {
console.error('Test suite failed:', error);
process.exit(1);
}
}
Sounds good. This is a good starting point for me.
Hi @slavingia. This is my first phase of implementation for shortest lib. Inside packges folder I have implemented the Api and syntax support. Before I go any further please let me know if I am in the right direction or if you'd like to add/remove something.
Note: Right now there are no test runners - to be implemented next
https://github.com/m2rads/shortest-lib.git
I implemented UITestBuilder instead of RouteTestBuilder because I think we can dedicate that module for API testing (support microservices). I think this is more intuitive for e2e testing. What do you think?
In examples/basic-react-test folder you can see shortest.config.ts. This is where the user can setup baseUrl, browser support and their anthropic api key. You can navigate to test folder to see example test cases.
If you are happy with this implementation I would like to propose implementing these features next:
This design would allow developers to run tests in any CI/CD pipeline including Github actions.
Please let me know if this aligns with your vision.
Also would you like to create a new repo and include these changes there or do you prefer to keep everything in a mono repo?
Monorepo is good. I'll try running a shortest today!
Just fyi, shortest does not have any test runners yet. I just wanted to make sure I got the syntax right. An AI test runner is under development :)
I can't guess at the syntax being right without actually using it. See "now or never" value of antiwork, scope/design/build at the same time
Good point. I am finishing up development. I will roll it out soon.
Hi @slavingia . Just a quick update regarding the progress of shortest package. I have implemented a parser for test suites to extract AI instructions. I have implemented browser-use (custom tool to only enable AI to access the browser).
The last step is to bridge the parser and the browser tool via AI. I think I will be able to finish this by tomorrow and open up a PR for you to test. Please let me know if you have any questions.
Here's a quick demo of browser use:
https://github.com/user-attachments/assets/93cb68af-33ed-42d3-968a-ee95ae5d937f
I am also adding an option for executing the tests in headless mode so that we can run the tests in github workflow or jenkins as we discussed before.
Nice! Yeah I assume headless locally by default and then one could choose to run a test with visual feedback if any fail from that.
Hard to read the tests you wrote that the AI is parsing but lgtm. Excited!
Reading into computer use, it has access to bash (so db via rails c or what not) and browser (so email via service); so should be able to be truly end-to-end testing. hardest part will be integrations with 3rd party services such as Stripe or Wise, where the browser would need to be able to continue full oauth login as part of an e2e test.
https://sdk.vercel.ai/docs/guides/computer-use
some internal antiwork chatter if it's helpful to get some of the overall direction:
Yes! And I think there is a solution for that. We should allow developers to hard code some logic such as oauth for 3rd party services. And then we'll give control over to AI.
So I have a feature in my mind to extend shortest api to allow developers to write browser actions like selenium. Shortest will compile these and run them on the browser.
This is will bypass all 3rd party oauth or login issues. What do you think?
We could also delegate this login task to AI but then on the subsequent runs, we will cache the AI response instruction and use it for the next execution.
This is where lifecycle hooks such as beforeAll and afterAll become very useful for us.
Or even something better!! We can allow developers to define their own tools and then shortest will use them with AI to execute the tests.
For example, right now internally shortest has these tools: move mouse, click, tab/window managment, type, keys, screen shots.
We can create an api that allows developers to create tools that are very customized to their own setup.
And obviously we can foresee some of these developer needs like everybody use Stripe. so we can integrate these common tools internally in the lib.
But we can also allow them to define some hacky thing they have done in the app to fit their own use case.
I think we can start with GitHub and get it working within shortest itself, then add resend and stripe and wise for Flexile, etc
Added an example of how simple this should be for devs; possibly even simpler. The AI is like a AA person testing it. Most things should not be cached though, since that defeats the purpose of e2e testing. If every test is truly end to end, you can run 100 in parallel and be done very quickly.
Even email support feels overkill to me; you've tested the core functionality of the app and emails can presume to work.
QA can look at what's taken place dynamically after each steps of computer use and end the test, then make a holistic determination on if it passed or failed
Sounds good. FYI I am resolving some errors. I should be done today.
Still working on this. One issue I am dealing with is that I quickly hit anthropic per-minute rate limit after 3-4 execution steps. This happens because of sending screenshots. I am going to try finding a more efficient feedback loop and see what works.
I finally got shortest working end to end. Fix rate limit with proper prompt handling. I just need to improve the prompt structure and clean up some code and it should be ready to use.
https://github.com/user-attachments/assets/b9b6e1ae-9fba-41bf-b43a-53a67f706839
Works the same in headless mode
PR is open. Please test it out and let me know what you think.
It's very simple right now but next step is to implement the compiler so that devs can mix up human language and some logic to execute more complex tests.
Also I will focus on writing an assertion system.
Nice looking great! Will checkout the PR locally in the next few days and try it out. Maybe next week latest!
Reviewed w team and looks great!
Thank you! I'm glad to hear that. It can be a little bit slow right now due to the back and forth with AI but I have an idea of how to improve the speed by a lot.
I am right now working on mixing this with tools API to see if we can get passed Github OAuth.
Hi @slavingia. I found a way to go through Github login securely without passing the password and auth key to AI.
Before I open up a PR I wanted to discuss caching with you. I think there are test cases that definitely need to test the Login flow with all the auth providers. I also think that there are cases that do not require that and it makes sense for those test cases to skip login flow to speed up the tests. So for that reason I wanted to propose support for injecting auth sessions into the browser to bypass login. This is what I believe is the case:
Cookies -> Quick Tests (80% of cases) Login Flow -> Critical Path Tests (20% of cases)
What do you think?
https://github.com/user-attachments/assets/d0441f0b-4116-43e9-a74b-8ef5014afc8f
What would be cool is if it were automated so the developer doesn't have to think about the difference; logins via a specific GitHub user would automatically be cached (maybe it's a flag to turn on this feature, but seems for now we can assume people will want it on) and then that auth session is injected.
Small thing too, would be good to redirect to the dashboard upon successful login to save users a click (and make the test a bit shorter/faster to run).
Quick Update:
Added Github login tool for testing login flow with Github. Browser by default stores the user session. Added clear session tool that AI will use to clear user sessions if it's specifically asked to test a feature where validating login flow is required.
This way devs do not need to manually control this behaviour and AI will automatically execute login validation if necessary for that feature.
I think next step would be DB validation, right? How would you like to tackle this? Imo i think we can just provide an assertion system so that devs can validate db with something like this:
assert db.query(user) == expected_user
I think this would be a simpler and most compatible approach. What do you think?
Hi @slavingia have you been able to run a shortest on your machine?
Not yet, will try when back in office on Friday
Hi @slavingia. Maybe worth discussing PR #85 before merging it. I think this is the last slice to complete the core functionality of shortest and I want to make sure I got it right. Looking at your original design db validation is done by AI:
test("A user can sign up to flexile and send an invoice", async (page) => {
await shortest.db.changed(SELECT * FROM users AND created_at > NOW() - interval '5 minutes' LIMIT 1);
I am onboard with this I just wanted to provide support for assertions in case a user wanted to opt out of this feature and wanted to do their own validation. Ultimately Assertions can be used in a unit tests.
Overall I think it would be nice to have an all in one package that supports the whole software testing pyramid.
Please let me know what you think.
Ah I didn't mean for that to be AI, just looks like it. Your way of doing the assertion works, it's just 2 lines instead of 1 which I think is probably necessary due to the await.
A new testing format that allows testing web apps end-to-end really quickly and efficiently.
Goals:
Specifics:
Example test:
// app/shortest.ts import { page } from 'page.tsx';
test("A user can sign up to flexile and send an invoice", async (page) => { await shortest.db.changed(
SELECT * FROM users AND created_at > NOW() - interval '5 minutes' LIMIT 1
);await shortest.email_received({ to: 'client@example.com', subject: 'New Invoice', contains: '$100.00' }); });
Ideas: