Closed obulat closed 4 months ago
@WordPress/openverse-maintainers, @panchovm has expressed interest in leading this project and is going to start the kickoff post next week. He'll be AFK some days at the start of April which was the original estimated start date for this project.
Project Proposal was merged (#1890) and project is ready to continue with Implementation Plans.
Hi @fcoveram, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
Hi @fcoveram, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
A new design (i2) was shared in #2113. Waiting for responses to continue with the design discussion
Hi @fcoveram, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
As noted here, designs were approved due to no discrepant comments. I will prepare the assets for their hands-off and comment again once done.
As discussed during the prioritization meeting, @obulat will lead the implementation and will write the involved plan.
Hi @fcoveram, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
Implementation plan started a few days ago and has been discussed in #2676
Implementation plan was approved and issues have been created. @obulat has started work.
Hi @fcoveram, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
Dev work continues, and a Milestone was created.
To the above, I think we need to add the following tickets to the milestone as they address part of the changes
What do you think @obulat?
I added the issues to the milestone, and updated the comment to rename "Frontend milestone" to "Milestone" because it contains some API-related issues.
Merged so far: additional_search_views
feature flag and VCollectionHeader
component added, VAudioCollection
component extracted.
Current work on the project is concentrated in the frontend changes to the single result page and the API changes:
The API routes PR took a little longer to review and merge than was expected because the documentation needed to be added, both in code and in the API user-facing documentation (which meant setting up the DRF Spectacular correctly with correct serializers. The PR is ready for re-review:
Next steps for this project will be the frontend changes:
Hi @obulat, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
Hi @obulat, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
Hi @obulat, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
Hi @obulat, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
The API changes have been merged 🎉
Some endpoints you can check in staging (with only the subset of data): SMK images: https://api-staging.openverse.engineering/v1/images/source/smk/ Images that are attributed to "Anonymous, Italian, 18th century artist" at the Metropolitan Museum of Arts: https://api-staging.openverse.engineering/v1/images/source/met/creator/Anonymous,%20Italian,%2018th%20century%20artist Audio tagged with "istanbul": https://api-staging.openverse.engineering/v1/audio/tag/istanbul/
There are two open PRs adding the changes to the Nuxt frontend.
VCollectionHeader PR
After the discussion about tag normalization in #3342, it is also necessary to remove lower-casing the image tags in the ingestion server:
https://github.com/WordPress/openverse/blob/7967abd24c7e128d7a0a98bb71f9a21e9d19e30c/ingestion_server/ingestion_server/cleanup.py#L130
The tag is lower-cased here only to check if it is in the denylist, the saved tag name keeps the original case. So, no change is needed here.
Updates to the tags display, creator view results label, analytics events and the scrolling behavior.
Currently, we have two events for the search view (REACH_RESULT_END
and LOAD_MORE_RESULTS
) that send the following properties:
{ /** The media type being searched */
searchType: SearchType
/** The search term */
query: string
/** The current page of results the user is on,
* *before* loading more results.. */
resultPage: number
}
We need to update these events to use both on the regular search and on the additional search views.
I think adding something like strategy
(used in the search store in Nuxt, with values of default
for regular search, and tag
| creator
| source
for the additional search views. This way, we could re-use the query
property to send the value of tag
, source
or creator
(as a source/creator
string) with the event.
Does this shape make sense, @WordPress/openverse-frontend ?
Would adding a new property to the event be okay and allow us to compare the new events with the old ones? Or would we need to back-fill the old events with the strategy
property?
We currently limit search pagination depth to 5 pages. This was done to ensure stability of the API because deep pagination together with the dead link removal was making some searches make a very high number of ES/db requests.
Additional search views are always sorted by the newly-added to Openverse
, which means that they are safer as there should be much fewer links on the top pages.
In the current implementation of the additional search views, we show the "10000 results for this source" result label, but the users can only browse the first 100 (5 pages). I think we should increase the page depth for the additional search views, but not remove it entirely (because otherwise it would lead to scraping, and would cause the same problems as the search page had before the 5 page limit was set). However, we need to somehow convey it to the users that we are only showing the first N results.
@WordPress/openverse-api, what do you think about the depth of pagination: should we increase it for the additional search views, and if so, to what number? @WordPress/openverse-frontend, @fcoveram, do you agree with somehow displaying how many results the user can actually browse (out of the more than 10000 results), and if so, how?
In the current implementation of the additional search views, we show the "10000 results for this source" result label, but the users can only browse the first 100 (5 pages).
I don't fully understand this. As a user, I can continuously load more results without any problem.
do you agree with somehow displaying how many results the user can actually browse (out of the more than 10000 results), and if so, how?
I don't think is necessary to convey exactly how many results are. Edge cases will reach out the 10,000 items. The label intends to communicate the results dimension; let's say, whether it surfaced 10, 100, or 1,000 results. The key word is over X results
and I'm drawn to keep it as an indicator of massiveness.
I don't fully understand this. As a user, I can continuously load more results without any problem.
Try going to https://staging.openverse.org/image/tag/cat
and clicking the "Load more" button 20 times. After the twentieth click, the button disappears. This is caused by the API's MAX_PAGINATION_DEPTH
setting:
Admittedly, 20 pages is a lot, and not many users would hit this limit. However, I experienced one of the collection views only being five pages deep, which was a much more confusing experience. It was probably caused by dead links in the results.
So, if we leave the 20 page limit, we only expose 400 results for any collection (out of the 10 000 in the label).
I understand. My UX approach is allowing users to continue seeing results, but I would like to hear what other folks think over what options we have and what is a good practice.
I don't think we should change page depth rules. They're already complex (auth vs non-auth) and adding more exceptions doesn't feel necessary. Viewing tags and sources isn't about browsing the entire thing. We have no way to support that technically, particularly for large sources (otherwise deep pagination would cause performance issues for all users).
In the current implementation of the additional search views, we show the "10000 results for this source" result label, but the users can only browse the first 100 (5 pages). I think we should increase the page depth for the additional search views, but not remove it entirely (because otherwise it would lead to scraping, and would cause the same problems as the search page had before the 5 page limit was set). However, we need to somehow convey it to the users that we are only showing the first N results.
Can we change the copy to "showing the top results from this source" or something like that. I do not think we need to communicate the precise number of theoretically available results in non-search view, as it's only vaguely relevant if you can filter those down further. The precise number we actually surface is likewise unnecessary, for the same reason (you can't filter them on those pages). We can't get a precise number anyway (due to dead link issues) and it isn't useful information on those pages, because again, it's only useful if you can further filter them, which you cannot. These pages are about browsing, not searching. As such, we don't need to give any other information. If we want to display the stats for a source when it's available, we can do that. "Showing the top results of
I could see an argument that it's useful statistical data for research, but we don't make accurate data available for a variety of reasons, and there are much better ways of deriving that data if we did want to anyway.
Once we're able to implement in-house popularity sorting, we can clarify "top results from this source" (or whatever) to mean "most popular results from this source" or "most popular results with the tag
Anyway, to summarize, my preference is to make no changes on the API side, and just to clarify the copy by removing unnecessary and inaccurate information.
Can we change the copy to "showing the top results from this source" or something like that. I do not think we need to communicate the precise number of theoretically available results in non-search view, as it's only vaguely relevant if you can filter those down further.
I like this approach and the copy.
Even that is overkill in my opinion, for the reasons related to search. I wouldn't block the completion of this project on that even if we decided to add that
+1 to this
While participating in the discussion at #3825 it occurred to me that one of the largest benefits of these additional search views, outside of ease of sharing and bookmarking by users, is SEO. Unfortunately, we've missed discussing SEO considerations in the project proposal (myself included) and the implementation plan.
Are we setting good meta titles and descriptions for these pages? Will we allow search engines to crawl them?
Is there a way to support crawling on pages with query parameters? I'm not familiar with the SEO considerations for those kinds of pages. My understanding generally was that dynamically generated pages, like search results, wouldn't be suitable for SEO. But there are some parts of it that could work for SEO, if crawlers skipped the results section.
@sarayourfriend I think I remember that you've already discussed this with @obulat elsewhere, but yes, search engines don't treat pages with query params or paths differently. They basically consider URL structure an implementation detail.
A good example of this is the Google search results for "stock photos dogs" which shows Adobe's stock photography site:
This result uses the URL https://stock.adobe.com/search?k=dog
.
Also something to consider: Many of these stock sites store popular searches in dynamically-generated sitemaps. Unsplash has extensive sitemaps with thousands of entries for popular searches, landing pages, collections, and so on, for example.
In the future, we could query analytics at application build time or dynamically with caching to generate sitemaps containing the most popular search terms, collections, and so on. That would be a nice follow-up project, distinct from this one.
The Implementation Plan for Additional search views was updated with the conclusions from discussion in #3825.
The frontend has converted collection views to use query parameters instead of path parameters. This required additional changes in how media is fetched (#3835 PR under review). The API changes are also under review in #3887.
Two changes to the frontend were also merged:
Stabilize the API parameters (remove the conditionals from the additional search views documentation and the unstable__ prefix from tag and collection parameters)
FWIW, I don't think we necessarily need to immediately stabilise new API parameters after launch and verification of the basic feature in production, and it's only to our benefit to move slowly on doing so. If you're okay with holding off a month or so on it to give time for things to settle and see how the new parameters work in practice, I'd recommend it. I'm definitely glad we've kept the sensitivity parameters and return value unstable, as it's given us flexibility to make changes to the behaviour without worrying about API version commitments (for which we still don't have concrete definitions).
Just a suggestion, anyway.
The last PRs of the project implementation was merged:
The project will be launched once the following two PRs are merged:
After the issues from "In progress" section are deployed and the new views are thus launched, we will need to test the views in prod for a while.
Then, after we determine that the views work well, we will can start fixing the rest of the issues in the milestone to clean up the temporary changes (feature flag in the frontend and the API unstable__
parameters).
@WordPress/openverse-frontend I am going to take on this PR (#4084) and the launching of this project while Olga is AFK.
The feature is already online, but there are two pending tasks listed in the initial comment. Are we ready to mark them as done @obulat or are some work still in progress?
@fcoveram, I created a separate milestone for the cleanup tasks, and moved all the pending tasks there. The Additional search views milestone is completed 🎉
Description
To improve the search experience on openverse.org, we want to add more ways of displaying search results:
As a start, we can create and implement a single template for all three types of pages.
Documents
Issues
Prior Art