Closed zackkrida closed 11 months ago
Frontend design work is taking place here. This work assumes that the MVP of this functionality will allow the user to show sensitive results, and to elect to blur sensitive results in the search results view (with blurring as the default value).
We also assume the MVP of this functionality does not display or allow the user to enable specific sensitive content subcategories. However, this active design work will be easily adaptable and compatible with such functionality if we decide to include it.
I have been delayed in publishing my recommendations for a sensitive content terms list for Openverse, but will do so by the end of the week.
@zackkrida This project currently has the "In RFC" status, but it looks like that means we're skipping the project planning step of this. Is that intentional? I want to clarify before I get to deeply into writing the implementation plan.
This work assumes that the MVP of this functionality will allow the user to show sensitive results, and to elect to blur sensitive results in the search results view (with blurring as the default value).
Further clarification: are these separate features? Is this re-wording of the quoted summary accurate? "We will enhance our data for what a "sensitive" result might be, thereby filtering them out of queries that do not have mature=True
. Additionally, when mature=True
and we are including results with sensitive terms, then we will by default blur those results until the user designates that they do not want the blurred results."
The part I want to clarify here is whether we expect results with sensitive terms to appear in queries where mature is not True
(i.e., the default query) or if blurred results would only appear for when mature=True
.
Hi @sarayourfriend! I've fixed the project status to 'In Kickoff'. I erroneously moved it there when the design proposal was created, as I forgot this project included the full scope of the term matching and not just the frontend blurring implementation.
I was of the opinion that we'd want a proposal and implementation plan here, but would like your opinion.
To clarify the update points about the design work:
Users opt-in to seeing sensitive search results. After opting in, blurred sensitive results will be displayed in their search results. If desired, they can further opt-in to unblur these results.
Users will not see sensitive results (those marked as mature
internally) in the default query.
For single results, how does this sound:
Individual results will be blurred for all users when visited directly by url, regardless of specified user preferences. Sensitive results will only be unblurred when navigated to from a search results view where 'show sensitive results' is enabled and 'blur sensitive results' is disabled.
@panchovm and I decided that it might be best to pause design work after soliciting feedback on the latest iteration of the design. This is primarily to prevent us from having to make further assumptions that may contradict the upcoming planning documents. The designs are in a place where some preliminary development can start (or resume, considering #824) behind a feature flag while we continue to plan.
Thanks for the clarifications on all accounts. They sound great. I've got about half a technical implementation plan written for this, it turns out the most difficult part will probably be designating which results have sensitive terms in their textual content in the search API in a way that does not degrade performance.
I'll write the kickoff post first and get the project planning generally started. Having technical implementation plans is a good idea here. Blurring likely deserves its own consideration unless we decide to just use regular CSS blurring, but it's something we can discuss in the kickoff discussion.
After adding the second part of iteration two (i2), I blocked the design ticket (#791) until this project concludes the design scope.
The project proposal has been merged and the implementation plan request issues have been created. Both are currently blocked by different things:
Both IPs could have some aspects written without resolving these blockers first. However, both are blocked on core aspects of their functionality, so I think it makes sense to truly wait until we've resolved the blockers before starting on them.
Thanks to @krysal and @AetherUnbound for the quick reviews on the project proposal (it was merged 1 week before its deadline) and thanks to @panchovm, @obulat, @cogdog, and @zackkrida for additional reviews and comments.
Noting here that the ETA for this project is not a reasonable estimate for completion and folks keeping track should not expect it by then. We still haven't resolved the questions blocking the API implementation plan and the designs are not yet available for the frontend. Writing and discussing those implementation plans could realistically take until the end of March and then implementation will only start afterwards. I think a more realistic ETA is the end of April.
I've reached out to maintainers of the WordPress GitHub org to create a new repo for the sensitive terms list. I made this post on our Make blog to provide them with context.
I've also created a new group called @WordPress/openverse-sensitive-content-reviewers who will be responsible for reviewing the text contents of the new repo.
Update: project proposal is approved and merged. The first implementation plan is being reviewed #996.
There have been two significant changes to parts of this project:
sensitivity
field for the API query response I discovered that we do not actually ingest any sensitivity information from providers. The plan originally assumed that we did and designs reflected that in some copy that stated that the provider marked a result as sensitive. Not needing to cover this case significantly reduced the complexity of the implementation of the sensitivity field. As part of a related project's implementation plan to change mature
to sensitive
(https://github.com/WordPress/openverse/pull/2126), we may be able to remove the overloaded meaning of the single mature
field on the Elasticsearch documents and use more explicit names like provider_marked_sensitive
and user_reported_sensitive
for our sensitivity boolean properties on documents. In any case, the current approach works perfectly for us to derive whether a result has a confirmed user report or sensitive text, the only two cases that actually exist in our data[^1][^1]: The fact that we do not re-ingest data from Flickr or other providers that have sensitive results and will not get updated information if one of those works is marked sensitive after our initial ingestion of it. There's nothing we can do about this now. It is its own massive project and falls under similar stuff as the #417.
We need to make an infrastructure change to add the new ingestion server environment variable for the sensitive terms list URL, then we can run the filtered index creation DAG.
I've just finished deploying the ingestion server. It now has the configuration for the sensitive terms list. The next time filtered index creation runs (on the next audio refresh) we should be able to see if the terms list is working as expected.
@sarayourfriend Madison has identified an issue with the API side of this project here: https://github.com/WordPress/openverse/issues/2328
Thanks, Zack. I left a comment on the issue (Madison pinged me there too). This might introduce significant delays in this project depending on the avenue we decide to take in that issue. I'll update here again once we have a clearer idea of which path we'll pursue.
Hi @sarayourfriend, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
No progress has been made on this project since the last update. As mentioned before, the frontend implementation plan was approved, but issues still need to be created the for the plan's related milestone (linked in the issue description) by @dhruvkb. The primary reasons for the stall on this project are AFKs, conferences, and the several recent API incidents taking attention away from ongoing projects.
@dhruvkb I've set you as the lead for this one moving forward. I'll be happy to help with the implementation in the coming month, as well as the remaining issue creation later this week.
CC @sarayourfriend
@sarayourfriend and @dhruvkb a few questions:
ENABLE_FILTERED_INDEX_QUERIES
enabled in the production and staging APIs. I see it set to true in our env.template
but not in the infrastructure repo or our task definitions. Should it be? (I think so π)unstable__include_sensitive_results
stable? I assume it can go live prior to the frontend implementation.I don't think we have ENABLE_FILTERED_INDEX_QUERIES enabled in the production and staging APIs. I see it set to true in our env.template but not in the infrastructure repo or our task definitions. Should it be? (I think so smile)
Production filtered image index creation needs to successfully finish before that can be turned on in production (and staging). There is an open incident report for the production data refresh that involves issues with the filtered index creation. I haven't looked deeply into what y'all discovered after I logged off yesterday, but if there's anything that still needs tweaking/fixing in that DAG, then we should do so before turning on filtered index queries in production.
By what criteria should we make the API param unstable__include_sensitive_results stable? I assume it can go live prior to the frontend implementation.
I originally intended it to remain unstable until the frontend feature was complete and ready to be turned on. It is prudent to wait to stabilise the API feature until we have confirmed that its current implementation fulfils the requirements and expectations of our own use case on the frontend. Once the frontend feature flag is ready to be turned on for production, we can stabilise the parameter and the search result unstable__sensitivity
property, update references to such on the frontend, and then turn the feature on for production frontend.
Of course, that relies on filtered index queries being on in production as well, so we need to resolve any issues with the DAG, as mentioned above.
Thanks!
Once the frontend feature flag is ready to be turned on for production, we can stabilise the parameter and the search result unstable__sensitivity property, update references to such on the frontend, and then turn the feature on for production frontend.
Sounds perfect π
Concerning the filtered index dag, the problem has been identified and Madison is working on a fix:
https://github.com/WordPress/openverse/issues/2619
I'll recommend that we add ENABLE_FILTERED_INDEX_QUERIES
and deploy on Monday, provided that issue is fixed first.
There are open PRs for the single result views. The blurring of search results was merged.
Open PRs:
@dhruvkb, @fcoveram raised some thoughts about the overall flow. I thought I'd have time to create dedicated issues for these, but alas, I don't think I will. Perhaps we should have a sync session after our team meetup to address these:
I think a sync session with @fcoveram, and the team, at the meetup will be helpful, I'll make issues for these post that.
Noting that the following two points were part of the agreed design.
One other idea for that sync session:
I also like Google's solution, but there are other product-wise considerations pointed out here. But definitely something to keep in mind.
I am thinking of the icon like a label, rather than proposing any changes to the flow or behavior.
There are certain contexts where the blur alone may not be a strong enough visual cue, like the following results:
Oh, I see. You are right. The interface needs to provide more context. Good point.
I am not sure the global player needs to be blurred because for it to appear, the user must click a result, choose to unblur that result from the content safety wall, see everything in the single result view, play the audio and then go back to all results. After this has happened, the result item itself is unblurred so the global player playing that item need not be blurred.
As for the eye icon on the blurred images, I would be in favour of a visual context but I am not sure what the best way to indicate this is. I played a bit with the idea and have three thoughts.
The icon will need to be colored in a way that ensures visibility irrespective of the background. The easiest way to do this is to put the icon in a box with a fixed color (like we do for the licenses).
The icon looks a bit different and is placed separately from other iconography associated with the image.
The audio results do not have a corresponding icon. Is that because we can safely believe that text being blurred is unambiguous? What can be done about consistency between different media types?
@fcoveram your design input about these points will be very helpful, thanks.
I am not sure the global player needs to be blurred because for it to appear, the user must click a result, choose to unblur that result from the content safety wall, see everything in the single result view, play the audio and then go back to all results. After this has happened, the result item itself is unblurred so the global player playing that item need not be blurred.
I agree with this. I'm referring to the flow when users play a blurred audio track in the search results page. If the item has not been unblurred, and you click on play, the global player should also show blurred info. In other words, global player should mirror the audio track state.
Here is a quick prototype of this flow.
https://github.com/WordPress/openverse/assets/895819/482b62c5-f7fd-4bca-910b-1eb2965708e0
I tried different styles for the icon, and a black icon over a white layer with 60% opacity might work. The ratio contrast required for graphics is lower than for texts. However, the consistency between media types is my main concern.
After testing some ideas, I concluded that the designed consent flow (landing in the search results and enabling the sensitive results) is sufficient for conveying why some content (images and texts) is blurred. The change in the results area is very clear in that new content was added.
The project is very close to completion, with a total of 5 tickets, including 1 blocked ticket (#2550 that needs legal input) and 3 that already have PRs associated with them.
The API ticket for sensitivity
in single results #2926 is the biggest blocker to roll-out so getting it assigned and resolved is the next priority item.
I'm reaching out to the legal team again to ask for clarification on when we can expect a response. The last time I asked they didn't respond :confused:
Hi @dhruvkb, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
The latest on this is that we have heard back from legal and these discussions brought up some actionable items:
@sarayourfriend has already implemented points 1 and 2 (thanks! π) so I think it should be time to formalise the text into a page on Openverse.org. The third will soon be implemented as a part of that project.
Hi @dhruvkb, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information.
This has been shipped to production π π₯³ π ! One issue #3081 remains in the milestone blocked on the official translations to be written for the content safety explanation page.
The last issue in the milestone has also been resolved now. The project is shipped!
I've drafted an announcement post for the feature here: Google Docs Link. @dhruvkb and @WordPress/openverse-maintainers - please take a look and let me know if it sounds okay! I can post it this week and we can move this to "Success" π π
I will share the document with rmartinezduque asking for feedback.
Congrats on shipping this new feature! The post looks great to me. I think it clearly communicates the new feature and the benefits it brings. I only suggested a few minor edits, mostly to correct a grammar mistake. If possible, I think it would also be nice to include an image to accompany the announcement (perhaps you already have it in mind). π
I can work on a visual for the post β
Thank you @rmartinezduque and @fcoveram!
Our announcement P2 has been made: https://make.wordpress.org/openverse/2023/12/11/introducing-enhanced-content-safety-features-on-openverse/
And I've also made an amplification request for the marketing team here: https://github.com/WordPress/Marketing-Team/issues/330
With that, I'm going to move this project to "Success"!
Description
This project matches Openverse images against a list of sensitive terms. All sensitive single image results on the frontend are blurred and viewing them is opt-in for all users.
Development work is ongoing and tracked in a milestone: https://github.com/WordPress/openverse/milestone/12
Documents
Issues
Prior Art