WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
240 stars 194 forks source link

Additional search views for the frontend #410

Closed obulat closed 4 months ago

obulat commented 1 year ago
Start Date ETA Project Lead Actual Ship Date
2023-04-01 2023-11-30 @fcoveram, @obulat TBD

Description

To improve the search experience on openverse.org, we want to add more ways of displaying search results:

As a start, we can create and implement a single template for all three types of pages.

Documents

Issues

Prior Art

zackkrida commented 1 year ago

@WordPress/openverse-maintainers, @panchovm has expressed interest in leading this project and is going to start the kickoff post next week. He'll be AFK some days at the start of April which was the original estimated start date for this project.

fcoveram commented 1 year ago

Project Proposal was merged (#1890) and project is ready to continue with Implementation Plans.

openverse-bot commented 1 year ago

Hi @fcoveram, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

openverse-bot commented 1 year ago

Hi @fcoveram, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

fcoveram commented 1 year ago

A new design (i2) was shared in #2113. Waiting for responses to continue with the design discussion

openverse-bot commented 1 year ago

Hi @fcoveram, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

fcoveram commented 1 year ago

As noted here, designs were approved due to no discrepant comments. I will prepare the assets for their hands-off and comment again once done.

As discussed during the prioritization meeting, @obulat will lead the implementation and will write the involved plan.

fcoveram commented 1 year ago
openverse-bot commented 1 year ago

Hi @fcoveram, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

fcoveram commented 1 year ago

Implementation plan started a few days ago and has been discussed in #2676

zackkrida commented 1 year ago

Implementation plan was approved and issues have been created. @obulat has started work.

openverse-bot commented 1 year ago

Hi @fcoveram, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

fcoveram commented 1 year ago

Dev work continues, and a Milestone was created.

To the above, I think we need to add the following tickets to the milestone as they address part of the changes

What do you think @obulat?

obulat commented 1 year ago

I added the issues to the milestone, and updated the comment to rename "Frontend milestone" to "Milestone" because it contains some API-related issues.

obulat commented 1 year ago

Update 2023-09-06

Merged so far: additional_search_views feature flag and VCollectionHeader component added, VAudioCollection component extracted.

Current work on the project is concentrated in the frontend changes to the single result page and the API changes:

obulat commented 1 year ago

Update 2023-09-20

The API routes PR took a little longer to review and merge than was expected because the documentation needed to be added, both in code and in the API user-facing documentation (which meant setting up the DRF Spectacular correctly with correct serializers. The PR is ready for re-review:

2853

Next steps for this project will be the frontend changes:

2730

2774

2858

openverse-bot commented 11 months ago

Hi @obulat, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

openverse-bot commented 11 months ago

Hi @obulat, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

openverse-bot commented 10 months ago

Hi @obulat, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

openverse-bot commented 10 months ago

Hi @obulat, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.

obulat commented 10 months ago

Update 2023-11-18

Done

The API changes have been merged 🎉

Some endpoints you can check in staging (with only the subset of data): SMK images: https://api-staging.openverse.engineering/v1/images/source/smk/ Images that are attributed to "Anonymous, Italian, 18th century artist" at the Metropolitan Museum of Arts: https://api-staging.openverse.engineering/v1/images/source/met/creator/Anonymous,%20Italian,%2018th%20century%20artist Audio tagged with "istanbul": https://api-staging.openverse.engineering/v1/audio/tag/istanbul/

In progress

There are two open PRs adding the changes to the Nuxt frontend.

  1. Updates to the single result page: #3338
  2. Pages for the collections #3140

To do

3365 updated the result labels for the collections (e.g, "Over 10000 results found with this tag"), but I will also open a discussion to finalize the wording in line with the comments in the original VCollectionHeader PR

After the discussion about tag normalization in #3342, it is also necessary to remove lower-casing the image tags in the ingestion server: https://github.com/WordPress/openverse/blob/7967abd24c7e128d7a0a98bb71f9a21e9d19e30c/ingestion_server/ingestion_server/cleanup.py#L130 The tag is lower-cased here only to check if it is in the denylist, the saved tag name keeps the original case. So, no change is needed here.

obulat commented 7 months ago

Update 2024-02-07

Done

In progress

To do

Updates to the tags display, creator view results label, analytics events and the scrolling behavior.

Questions

Analytics event

3619 raises a question about analytics events that were not planned for in the Implementation Plan.

Currently, we have two events for the search view (REACH_RESULT_END and LOAD_MORE_RESULTS) that send the following properties:

{ /** The media type being searched */
    searchType: SearchType
    /** The search term */
    query: string
    /** The current page of results the user is on,
     * *before* loading more results.. */
    resultPage: number
}

We need to update these events to use both on the regular search and on the additional search views.

I think adding something like strategy (used in the search store in Nuxt, with values of default for regular search, and tag | creator | source for the additional search views. This way, we could re-use the query property to send the value of tag, source or creator (as a source/creator string) with the event. Does this shape make sense, @WordPress/openverse-frontend ? Would adding a new property to the event be okay and allow us to compare the new events with the old ones? Or would we need to back-fill the old events with the strategy property?

Deep pagination

We currently limit search pagination depth to 5 pages. This was done to ensure stability of the API because deep pagination together with the dead link removal was making some searches make a very high number of ES/db requests.

Additional search views are always sorted by the newly-added to Openverse, which means that they are safer as there should be much fewer links on the top pages.

In the current implementation of the additional search views, we show the "10000 results for this source" result label, but the users can only browse the first 100 (5 pages). I think we should increase the page depth for the additional search views, but not remove it entirely (because otherwise it would lead to scraping, and would cause the same problems as the search page had before the 5 page limit was set). However, we need to somehow convey it to the users that we are only showing the first N results.

@WordPress/openverse-api, what do you think about the depth of pagination: should we increase it for the additional search views, and if so, to what number? @WordPress/openverse-frontend, @fcoveram, do you agree with somehow displaying how many results the user can actually browse (out of the more than 10000 results), and if so, how?

fcoveram commented 7 months ago

In the current implementation of the additional search views, we show the "10000 results for this source" result label, but the users can only browse the first 100 (5 pages).

I don't fully understand this. As a user, I can continuously load more results without any problem.

do you agree with somehow displaying how many results the user can actually browse (out of the more than 10000 results), and if so, how?

I don't think is necessary to convey exactly how many results are. Edge cases will reach out the 10,000 items. The label intends to communicate the results dimension; let's say, whether it surfaced 10, 100, or 1,000 results. The key word is over X results and I'm drawn to keep it as an indicator of massiveness.

obulat commented 7 months ago

I don't fully understand this. As a user, I can continuously load more results without any problem.

Try going to https://staging.openverse.org/image/tag/cat and clicking the "Load more" button 20 times. After the twentieth click, the button disappears. This is caused by the API's MAX_PAGINATION_DEPTH setting:

https://github.com/WordPress/openverse/blob/f8e6d35bdacd9f9a5bff7ca887f0aa0751c30716/api/conf/settings/misc.py#L36

Admittedly, 20 pages is a lot, and not many users would hit this limit. However, I experienced one of the collection views only being five pages deep, which was a much more confusing experience. It was probably caused by dead links in the results.

So, if we leave the 20 page limit, we only expose 400 results for any collection (out of the 10 000 in the label).

fcoveram commented 7 months ago

I understand. My UX approach is allowing users to continue seeing results, but I would like to hear what other folks think over what options we have and what is a good practice.

sarayourfriend commented 7 months ago

I don't think we should change page depth rules. They're already complex (auth vs non-auth) and adding more exceptions doesn't feel necessary. Viewing tags and sources isn't about browsing the entire thing. We have no way to support that technically, particularly for large sources (otherwise deep pagination would cause performance issues for all users).

In the current implementation of the additional search views, we show the "10000 results for this source" result label, but the users can only browse the first 100 (5 pages). I think we should increase the page depth for the additional search views, but not remove it entirely (because otherwise it would lead to scraping, and would cause the same problems as the search page had before the 5 page limit was set). However, we need to somehow convey it to the users that we are only showing the first N results.

Can we change the copy to "showing the top results from this source" or something like that. I do not think we need to communicate the precise number of theoretically available results in non-search view, as it's only vaguely relevant if you can filter those down further. The precise number we actually surface is likewise unnecessary, for the same reason (you can't filter them on those pages). We can't get a precise number anyway (due to dead link issues) and it isn't useful information on those pages, because again, it's only useful if you can further filter them, which you cannot. These pages are about browsing, not searching. As such, we don't need to give any other information. If we want to display the stats for a source when it's available, we can do that. "Showing the top results of works from this source". Only relevant for sources though. Even that is overkill in my opinion, for the reasons related to search. I wouldn't block the completion of this project on that even if we decided to add that. It's also not relevant to the other two types of views.

I could see an argument that it's useful statistical data for research, but we don't make accurate data available for a variety of reasons, and there are much better ways of deriving that data if we did want to anyway.

Once we're able to implement in-house popularity sorting, we can clarify "top results from this source" (or whatever) to mean "most popular results from this source" or "most popular results with the tag " and so forth. Knowing that that will be a future possibility makes me even more resistent to the idea of trying to work out a way of accurately representing the precise extent of data available on those pages (in addition to the issues I shared above regarding the impossibility of accuracy and the general uselessness of the data compared to a regular search where it gives useful context).

Anyway, to summarize, my preference is to make no changes on the API side, and just to clarify the copy by removing unnecessary and inaccurate information.

fcoveram commented 7 months ago

Can we change the copy to "showing the top results from this source" or something like that. I do not think we need to communicate the precise number of theoretically available results in non-search view, as it's only vaguely relevant if you can filter those down further.

I like this approach and the copy.

Even that is overkill in my opinion, for the reasons related to search. I wouldn't block the completion of this project on that even if we decided to add that

+1 to this

zackkrida commented 7 months ago

While participating in the discussion at #3825 it occurred to me that one of the largest benefits of these additional search views, outside of ease of sharing and bookmarking by users, is SEO. Unfortunately, we've missed discussing SEO considerations in the project proposal (myself included) and the implementation plan.

Are we setting good meta titles and descriptions for these pages? Will we allow search engines to crawl them?

sarayourfriend commented 7 months ago

Is there a way to support crawling on pages with query parameters? I'm not familiar with the SEO considerations for those kinds of pages. My understanding generally was that dynamically generated pages, like search results, wouldn't be suitable for SEO. But there are some parts of it that could work for SEO, if crawlers skipped the results section.

zackkrida commented 6 months ago

@sarayourfriend I think I remember that you've already discussed this with @obulat elsewhere, but yes, search engines don't treat pages with query params or paths differently. They basically consider URL structure an implementation detail.

A good example of this is the Google search results for "stock photos dogs" which shows Adobe's stock photography site:

image

This result uses the URL https://stock.adobe.com/search?k=dog.

Also something to consider: Many of these stock sites store popular searches in dynamically-generated sitemaps. Unsplash has extensive sitemaps with thousands of entries for popular searches, landing pages, collections, and so on, for example.

In the future, we could query analytics at application build time or dynamically with caching to generate sitemaps containing the most popular search terms, collections, and so on. That would be a nice follow-up project, distinct from this one.

obulat commented 6 months ago

Update 2024-03-19

Done and in progress

The Implementation Plan for Additional search views was updated with the conclusions from discussion in #3825.

The frontend has converted collection views to use query parameters instead of path parameters. This required additional changes in how media is fetched (#3835 PR under review). The API changes are also under review in #3887.

Two changes to the frontend were also merged:

To Do

obulat commented 6 months ago

Possible deployment/launch plan (from #3887 comment)

  1. Switch the frontend additional_search_views flag on, without fully removing the flag, and set SHOW_COLLECTION_DOCS env variable in the API to True in prod. This means that the users on the frontend can see the additional search views as fully-launched, and the API users can start using them with the unstable__ parameters.
  2. Test how well both API and frontend work in prod for ~ 1 week. If needed, we can revert using the frontend flag and the API env variable.
  3. If everything is successful, add the stable parameters to the frontend.
  4. Stabilize the API parameters (remove the conditionals from the additional search views documentation and the unstable__ prefix from tag and collection parameters)
  5. Remove the additional frontend parameters and the feature flag.
sarayourfriend commented 6 months ago

Stabilize the API parameters (remove the conditionals from the additional search views documentation and the unstable__ prefix from tag and collection parameters)

FWIW, I don't think we necessarily need to immediately stabilise new API parameters after launch and verification of the basic feature in production, and it's only to our benefit to move slowly on doing so. If you're okay with holding off a month or so on it to give time for things to settle and see how the new parameters work in practice, I'd recommend it. I'm definitely glad we've kept the sensitivity parameters and return value unstable, as it's given us flexibility to make changes to the behaviour without worrying about API version commitments (for which we still don't have concrete definitions).

Just a suggestion, anyway.

obulat commented 5 months ago

Update 2024-04-08

Done

The last PRs of the project implementation was merged:

3979 fixed the bug in collection fetching that was found during testing of other changes.

In progress

The project will be launched once the following two PRs are merged:

To do

After the issues from "In progress" section are deployed and the new views are thus launched, we will need to test the views in prod for a while. Then, after we determine that the views work well, we will can start fixing the rest of the issues in the milestone to clean up the temporary changes (feature flag in the frontend and the API unstable__ parameters).

zackkrida commented 5 months ago

@WordPress/openverse-frontend I am going to take on this PR (#4084) and the launching of this project while Olga is AFK.

fcoveram commented 5 months ago

The feature is already online, but there are two pending tasks listed in the initial comment. Are we ready to mark them as done @obulat or are some work still in progress?

obulat commented 5 months ago

@fcoveram, I created a separate milestone for the cleanup tasks, and moved all the pending tasks there. The Additional search views milestone is completed 🎉