Optimization | Improve performance of api/repos

KrupaPanchal2527 commented 3 months ago

Current behavior for fetching repository details and list of issues -

Read from repos.json
Iterate over each repository and fetch it's details and list of issues
Once all the calls are done, we save it to a local class instance (which acts as a cache for now)
Upon receiving a new request, we will check whether we have data in our class instance or not
If data is available in cache and it's last modified time is less than 8hrs, we return from cache otherwise we re-fetch the data

Issue with above approach Whenever the cache has expired, it takes too long to fetch all the repo details and it's corresponding issue list.

Ideas on how to optimize We don't need issues list right away, so we can avoid fetching issues details while fetching the repository details for the first time. Whenever user opens up the accordion to see issues we can fetch them right away. To further optimize it, once issue list is fetched we can store it inside a cache with expiry time.

KrupaPanchal2527 commented 3 months ago

@aadeshkulkarni I just started working on this but I think this might not work out. Because if we are planning to fetch issues when clicking on accordion then we won't be able to display the issue count on the top-right of the card on page load. Either we remove the issue count (which I don't think will be a good idea) or we need to come up with another way of optimizing how we are fetching the data. What are your thoughts?

aadeshkulkarni commented 3 months ago

Given the context of having an average of 1,000 repositories, each with an average of 10 issues, the solution needs to balance initial load times, user experience, and complexity of implementation.

Here is an approach:

Key Requirements Recap

Ensure repository metrics, including issue counts, are available on initial load.
Optimize the retrieval of the issues list upon user request.
Keep the system responsive and avoid heavy initial loads.

Solution: Combined Prefetching and Caching

Separate Caching for Repositories and Issue Details

Repository Details and Issue Counts Cache: Fetch and cache repository details along with issue counts initially.
Issues Cache: Fetch and cache the issue lists separately when the user requests them.

Implementation Steps

Initial Repository Fetch:
- Fetch repository details and issue counts in one go.
- Cache these details with an 8-hour expiry.
User Requests Repository Details:
- Check cache for repository details.
- If the cache is fresh, return the cached data.
- If not, trigger a re-fetch of repository details and update the cache.
Lazy Loading of Issues:
- When a user opens an accordion to view issues for a particular repository, fetch the issue list for that repository.
- Cache the issue list for future requests to improve performance.
Background Prefetching: (Out of scope for now, since we don't have a background worker + We are not monitoring user activity)
- Use a background job or worker to prefetch issue lists for repositories.
- This can be done immediately after the initial cache population or based on user activity predictions.

Summary

Initial Repository Fetch: Fetches repository details and issue counts. Caches for 8 hours.
Lazy Loading Issues: Issues are fetched only when required (on accordion open), and cached separately with a similar expiry.
Background Prefetching: This ensures that the issues are pre-fetched in the background, reducing wait times for users. (This can be done later, if it's going to take time)

Benefits

Efficiency: Reduces initial load time by separating issue fetching.
Responsiveness: Users get immediate access to repository details and issue counts.
Scalability: The system handles large numbers of repositories without long initial loading times.
User Experience: Pre-fetching in the background ensures a smoother experience when users access issues.

By separating concerns (repository details and issues) and leveraging asynchronous background tasks, this approach provides a balanced solution to meet the requirements effectively.

aadeshkulkarni commented 2 months ago

This commit solves this problem in the following way:

Context:

There are 2 flows:

Cron flow: Cron jobs can be scheduled to execute /api/cron once every 12 hours. They can be easily configured on Vercel here.
App flow: This is how the app will behave when a user opens the application.

Architecture:

Since the data is updated asynchronously once in 12 hours, users will never have to wait too long anymore.

aadeshkulkarni commented 1 month ago

This PR slightly fixes this problem.

aadeshkulkarni / first-issues