heythisischris commented 6 months ago

Overview of tasks

At the beginning of April, I was assigned a bounty to refactor & migrate the Equalify API from PHP/MySQL over to Node.js/PostgreSQL.

Although most of the heavy lifting has been completed, there are a few crucial endpoints that still need to be built. These include some of the most important and foundational endpoints for Equalify. I talked with @bbertucc for an hour on May 13th, 2024 to better understand the remaining work (here's a Zoom recording of the talk, password is a6HZ^mSu: https://us06web.zoom.us/rec/share/7GJqcZDsj3w4_o_-_WRIheKpgF_4F_jDmw5ctLMeH7YEBMDIVPmXJoSyjKiZd0s.0M28uVYFATi_khH-?startTime=1715625788000). After digesting the conversation and sleeping on it a bit, I wrote out this set of requirements to complete a minimum-viable API for our v1 launch.

Add/get scans

We'd like to be able to kick-off a scan for a given website or sitemap. This requires initiating a scan with the equalify-scan, synchronously returning a queued_scan ID, and asynchronously waiting for a response from equalify-scan.

Once the scan completes, we can delete the queued_scan and create a result!

We'd also like to be able to return all queued scans (simple enough).

Add/get results

We'd like to be able to add results by digesting the output from a fulfilled queued_scan in our database. We'll have to digest the Equalify Schema output, replace integer IDs w/ UUIDs, and map existing resources to their corresponding ID. This will take a small amount of brain damage.

We'd also like to be able to return results in the Equalify Schema! This might be difficult for extremely large datasets without some form of pagination or compression (i.e. WordPress's 0.5 GB dataset). It's crucial that we follow the schema, however, because we'd be able to integrate the API across many services and improve developer experience ten-fold.

Furthermore, we want to allow developers to pass "filters" into the /get/results API request which narrow down results based on propertyIds, urlIds, nodeIds, nodeUpdateIds, messageIds, and tagIds. We will leverage Postgraphile to handle filtering SQL logic for us.

Tracking scans over time

We'd like for users to be able to re-scan a website and see what issues have been "equalified" or "unequalified" over time. We'll make sure to only reference results that have their source set to scan, because other sources could cause conflicts (i.e. the Chrome extension or a manual audit).

If an issue that was previously "unequalified" no longer appears in the scan results, we will equalify it! However, if an issue that was previous "equalified" suddenly appears in the scan results again, we will unequalify it.

This will be a manual process at first- eventually, we might automate this to run daily via some user-defined CRON job.

Full-featured LLM integration

The current LLM integration is a very barebones proof-of-concept. It'll be much easier to refine/expand once the rest of the API is built out. I intend of developing a full featured "assistant" which suggests code fixes to developers for a given website and set of flagged issues according to accessibility standards defined by Deque University. We'll likely want to support streamed responses. I'm even toying with the idea of fine-tuning our own OpenAI GPT-3.5-Turbo model... this could be enough of a distinct value add to encourage developers to flock over to our framework rather than copy/pasting questions into ChatGPT themselves. We'll see.

Integrate API w/ frontend!

I already began the process of assisting @wilsuriel03 with connecting endpoints to the new frontend. There will be quite a few new endpoints to integrate, so I'll be available and potentially contribute some code myself to assist in the process.

Checklist

Remaining tasks:

[x] Complete /add/results
[x] Complete /get/results
[x] Complete /add/scans
[x] Complete /get/scans
[ ] Complete /help
[x] Integrate w/ frontend

Cost & maintenance

The requested bounty is $4,500 for the second (and final) phase of the minimum-viable API build-out.

The requested maintenance cost is $750 per month. I'll be immediately available for any issues that might arise or technical support that becomes necessary. I'll also regularly engage with the open source community and help onboard all newcomers!

Timeline

I intend to have this completed and ready for v1 usage by June 1st, 2024. I'm also earmarking a few weeks into June for inevitable debugging/technical support during our v1 production rollout.

bbertucc commented 6 months ago

We'll look at this Monday when we review https://github.com/orgs/EqualifyEverything/projects/2

In the meantime, it would also be good to get @azdak's eyes on it. Very curious what he thinks. Before anything is in prod we definitely need to have two folks who understand the tech, and I'm not sure I'll be the second this time.

bbertucc commented 6 months ago

Finally had a bit of time to review. A few notes/questions:

Super exciting stuff! I did a quick code review and I'm really happy with how easy it is to understand everything.
Can the bulk of this API run without Amazon? From what I can see, the answer is "yes except for the LLM and Cognito-related features". That would be a perfectly fine answer. Authentication and the LLM are paid features in my mind. We just need to make sure other features can run locally.
$750/mo might be overkill. We don't have a large number of users or contributors. Judging by the maintenance of Scan, I would expect 2-3 bug fixes per month and maybe a weekly call to discuss future features.
I totally forgot about get/updates. We didn't discuss get/updates on our call. Returning a list of updates between set dates would be vital to creating our chart of issues over time.
Amazon costs. Do you have any rough idea of how much we'll pay in Amazon fees? A rough estimate would help me budget.
Deployment documentation. We'll need internal documentation on deploying this to Amazon. Again, I'll say that for this to become part of Version 1, @azdak or I will need to understand everything about how the API works, including deployment. This would be internal documentation because hosting the code is a paid feature in my mind.

Still curious if @azdak has anything to offer, but I'm happy to go over these questions on Monday's call or feel free to respond here. Thanks! Great work!

bbertucc commented 6 months ago

@heythisischris answers:

API Can run without Amazon. Even the LLM can run without Amazon (Bring your own Keys).
$500/mo for 3 months is fine, starting when it's built.
Instead of a different endpoints, we'll just add a param and update the schema.
Two DBs: $20/mo (staging + prod), Cognito (free for up to 25k users), Lambda (1M invocations free, but not over $100/mo), No EC2, EC2 to display db ($10/mo); OpenAI would be fluid.
Can create a private deploy.md file.

@bbertucc response: Good to go!

bbertucc commented 5 months ago

Processed buildout payment 2 of 2 ($2,250) on @heythisischris request with the note that we still need to get /help done and the API fully integrated with the frontend.

bbertucc commented 4 months ago

Closing this in favor of #395, since we're now in maintenance land.

EqualifyEverything / equalify

Complete minimum-viable API for v1 launch #335

Overview of tasks

Add/get scans

Add/get results

Tracking scans over time

Full-featured LLM integration

Integrate API w/ frontend!

Checklist

Cost & maintenance

Timeline