aadeshkulkarni / first-issues

Make your first open-source contribution.
https://firstissues.dev
MIT License
13 stars 14 forks source link

Optimization | Improve performance of api/repos #15

Closed KrupaPanchal2527 closed 1 month ago

KrupaPanchal2527 commented 3 months ago

Current behavior for fetching repository details and list of issues -

  1. Read from repos.json
  2. Iterate over each repository and fetch it's details and list of issues
  3. Once all the calls are done, we save it to a local class instance (which acts as a cache for now)
  4. Upon receiving a new request, we will check whether we have data in our class instance or not
  5. If data is available in cache and it's last modified time is less than 8hrs, we return from cache otherwise we re-fetch the data

Issue with above approach Whenever the cache has expired, it takes too long to fetch all the repo details and it's corresponding issue list.

Ideas on how to optimize We don't need issues list right away, so we can avoid fetching issues details while fetching the repository details for the first time. Whenever user opens up the accordion to see issues we can fetch them right away. To further optimize it, once issue list is fetched we can store it inside a cache with expiry time.

KrupaPanchal2527 commented 3 months ago

@aadeshkulkarni I just started working on this but I think this might not work out. Because if we are planning to fetch issues when clicking on accordion then we won't be able to display the issue count on the top-right of the card on page load. Either we remove the issue count (which I don't think will be a good idea) or we need to come up with another way of optimizing how we are fetching the data. What are your thoughts?

aadeshkulkarni commented 3 months ago

Given the context of having an average of 1,000 repositories, each with an average of 10 issues, the solution needs to balance initial load times, user experience, and complexity of implementation.

Here is an approach:

Key Requirements Recap

  1. Ensure repository metrics, including issue counts, are available on initial load.
  2. Optimize the retrieval of the issues list upon user request.
  3. Keep the system responsive and avoid heavy initial loads.

Solution: Combined Prefetching and Caching

Separate Caching for Repositories and Issue Details

  1. Repository Details and Issue Counts Cache: Fetch and cache repository details along with issue counts initially.
  2. Issues Cache: Fetch and cache the issue lists separately when the user requests them.

Implementation Steps

  1. Initial Repository Fetch:

    • Fetch repository details and issue counts in one go.
    • Cache these details with an 8-hour expiry.
  2. User Requests Repository Details:

    • Check cache for repository details.
    • If the cache is fresh, return the cached data.
    • If not, trigger a re-fetch of repository details and update the cache.
  3. Lazy Loading of Issues:

    • When a user opens an accordion to view issues for a particular repository, fetch the issue list for that repository.
    • Cache the issue list for future requests to improve performance.
  4. Background Prefetching: (Out of scope for now, since we don't have a background worker + We are not monitoring user activity)

    • Use a background job or worker to prefetch issue lists for repositories.
    • This can be done immediately after the initial cache population or based on user activity predictions.

Summary

Benefits

By separating concerns (repository details and issues) and leveraging asynchronous background tasks, this approach provides a balanced solution to meet the requirements effectively.

aadeshkulkarni commented 2 months ago

This commit solves this problem in the following way:


Context:

There are 2 flows:

Architecture:

Screenshot 2024-07-13 at 3 41 03 PM
aadeshkulkarni commented 1 month ago

This PR slightly fixes this problem.