🚀 Feature: `useEntityList` to support fields

drodil commented 3 months ago

🔖 Feature description

useEntityList fetches entities by specific filters but it does not support setting which fields to return from the API.

🎤 Context

When the number of entities is huge, for example, the catalog page response gets very large; the same goes for the scaffolder template page as the response also contains all steps included in the template while they are not needed to list the templates. Some numbers to understand this better:

Catalog with 9000+ components response size 2.7MB
Template page with 40+ templates response size 127kB

Especially in the catalog case, only some of the fields are used for displaying the entities, thus making large payloads unnecessary. Given, that this will add some more work for the backend to filter the fields from the response, it will save in network bandwidth quite a lot.

✌️ Possible Implementation

Add possibility to pass the wanted fields to useEntityList hook

👀 Have you spent some time to check if this feature request has been raised before?

[X] I checked and didn't find similar issue

🏢 Have you read the Code of Conduct?

[X] I have read the Code of Conduct

Are you willing to submit PR?

None

freben commented 3 months ago

Makes sense. I'll add though, that in large catalogs field selection just doesn't suffice - you'll need pagination too, exposed all the way to the caller.

drodil commented 3 months ago

That's true and while the current pagination is somewhat working, it has some flaws like this issue here https://github.com/backstage/backstage/issues/25904

Additionally, I would like to see this way of paginating the catalog become a reality https://github.com/backstage/backstage/pull/25899

vinzscam commented 2 months ago

I'm not sure this issue is needed as we are moving toward pagination 🤔

drodil commented 2 months ago

I think it is still relevant even though pagination will help. Especially with templates, the whole entity can be very large and rendering it in the table usually requires only couple of fields from the whole entity.

Nilay1999 commented 1 month ago

I believe field filtering would still be beneficial. In our organization, we've encountered a similar issue where we use GitLab to catalog our project details and have added numerous custom metadata fields. As a result, many pages end up fetching a lot of irrelevant data.

vinzscam commented 1 month ago

I believe field filtering would still be beneficial. In our organization, we've encountered a similar issue where we use GitLab to catalog our project details and have added numerous custom metadata fields. As a result, many pages end up fetching a lot of irrelevant data.

I believe this can led to issues if you are listing the entire catalog in one shot, but in this case we aren't.

I think it is still relevant even though pagination will help. Especially with templates, the whole entity can be very large and rendering it in the table usually requires only couple of fields from the whole entity.

But here we are jumping directly to a 'slight improvement' without having an actual benchmark to determine whether this is truly an issue.

Let's break down the points raised in the description:

Catalog with 9000+ components response size 2.7MB

While the payload is indeed large, filtering by fields alone won't significantly improve performance. Implementing pagination would be a more effective solution for handling large datasets and would likely have a much greater impact on performance and user experienceSwitching on pagination will have a much bigger impact.
Template page with 40+ templates response size 127kB

Here, the current payload size is relatively small. Even though field filtering could reduce the size further, the performance gains would be minimal and likely not noticeable to users.

I agree with the principle that we should avoid fetching unnecessary data. However, in these cases, the benefits of implementing field-specific fetching are marginal compared to the potential overhead. My points are:

Focusing on field filtering might divert our time from more impactful performance improvements
Adding field-specific fetching increases the complexity of EntityListContext, which lead to maintenance challenges, especially if adopters need to synchronize requested fields with UI requirements (let's say someone want to customize the columns of the CatalogTable, potentially introducing bugs)
Without a measured outcome, it’s challenging to justify the added complexity. We should prioritize changes that have a clear, significant impact on performance and user experience.

backstage / backstage