Closed annatuma closed 4 years ago
Outstanding action items: 1) @aldenstpage we need a ticket (or tickets) for the backend processes here. Please review the spec, create ticket(s), and update the description to include those.
2) @kgodey @brenoferreira does the backend work need to be complete before any work can be done on the UI? If so, please add "blocked" labels to the frontend tickets.
3) @mathemancer can you confirm that we are indeed pulling in mature/NSFW metadata from some sources, and these can be referenced when @aldenstpage works on https://github.com/creativecommons/cccatalog-api/issues/339?
Hey @annatuma, I've updated the tickets with the required work for the backend. ~Could you take a look at https://github.com/creativecommons/cccatalog-api/issues/474 and fill out the list potential reasons users will be able to report images for the NSFW action interface?~ All of the details I need are here, thanks!
We're going to assign this to everyone on the team in order to review this in this order. Once you're done reviewing it, please assign it to the next person in the queue: @aldenstpage @mathemancer @brenoferreira @kgodey
(Deleted my question; I read the DMCA page and see that there's a Google form, and understand the desired DMCA flow now)
On the NSFW front, I'm not aware of any source where we're pulling content that we know is mature.
I'm quite concerned about potential abuse of the flagging system. This is a common vector for censorship attacks on the internet at the moment. I'd like to make sure we have the necessary data to detect such an attack, and to prevent abusive flagging.
On the other hand, preventing abuse of the flagging system could entail collecting data to identify abusive flaggers, and I am quite uncomfortable with keeping any data on users. I think we should make sure that we don't collect any data without good reason, and fair warning to the user.
One option might be storing an anonymized IP address in the table as well (with a large bold warning that the data will be stored on the submission form).
The alternate idea I had for that was to ask for an email, and send a one-time link to the 'real' form to the email they submit (thereby validating the email). We could then store an anonymized version of the validated email address.
Any record with the action “Mark NSFW” should: Have “NSFW” added as metadata to the content record in the Catalog
Any record with the action “De-Index Content” should: Use the existing endpoint (creativecommons/cccatalog-api#294) for copyright takedown.
To my understanding, the scope of this (meta) issue is limited to getting info into the DB at the service layer. @annatuma Is that correct? If so, do we want further issues to propagate that info back into the data layer, or should it stay in the service DB only? If so, we won't be able to use the meta_data field for holding the NSFW tag. We'd need some other plan there.
LGTM.
@annatuma the frontend work can start without the API work having to be complete. The API spec is enough for the work to start.
“{content type:Image} is Adult Content.”
I think we need to standardize on a single terminology here. The reporting calls it "Adult", the backend flag is "NSFW", and the filter is "Mature". This is going to cause confusion later. We should call it one thing everywhere.
We need to write a script, set to run every 12 hours, that looks for rows where the status of the record has been set to “yes” since the script last ran, and where the action taken in the table is either “Mark NSFW” or “De-Index Content”.
We're building API endpoints for this. Why do we want to run it every 12 hours instead of immediately? cc @aldenstpage
Additionally, I share @mathemancer's concerns about trying to identify and minimize abuse of the reporting functionality. We should address this in the design. I like the idea of storing anonymized IP addresses, but we need to update the mockup to clearly let people know that we're storing their anonymized IP address.
@annatuma assigning back to you for review of all the comments and potential updates to the spec.
Any record with the action “Mark NSFW” should: Have “NSFW” added as metadata to the content record in the Catalog Any record with the action “De-Index Content” should: Use the existing endpoint (creativecommons/cccatalog-api#294) for copyright takedown.
To my understanding, the scope of this (meta) issue is limited to getting info into the DB at the service layer. @annatuma Is that correct? If so, do we want further issues to propagate that info back into the data layer, or should it stay in the service DB only? If so, we won't be able to use the meta_data field for holding the NSFW tag. We'd need some other plan there.
I'd prefer engineering input on what makes sense here. What's the use case for storing this in the data layer? Is there a reason the service layer doesn't suffice? Service layer is clearly enough for the frontend, but if there are reasons this should go back to the data layer we should evaluate those. @mathemancer @aldenstpage @kgodey please weigh in.
“{content type:Image} is Adult Content.”
I think we need to standardize on a single terminology here. The reporting calls it "Adult", the backend flag is "NSFW", and the filter is "Mature". This is going to cause confusion later. We should call it one thing everywhere.
Fine by me. Updating it to "Mature" everywhere.
We need to write a script, set to run every 12 hours, that looks for rows where the status of the record has been set to “yes” since the script last ran, and where the action taken in the table is either “Mark NSFW” or “De-Index Content”.
We're building API endpoints for this. Why do we want to run it every 12 hours instead of immediately? cc @aldenstpage
Immediately would be better, the 12 hour run came out of our earlier conversations about this feature. That said, whatever @aldenstpage says is feasible here (12 hours versus immediate) is fine by me.
Additionally, I share @mathemancer's concerns about trying to identify and minimize abuse of the reporting functionality. We should address this in the design. I like the idea of storing anonymized IP addresses, but we need to update the mockup to clearly let people know that we're storing their anonymized IP address.
I'll sync with @sarahpearson about the appropriate way to do this.
@aldenstpage please update the API spec to say "mature" instead of "nsfw" per @annatuma's update above.
To my understanding, the scope of this (meta) issue is limited to getting info into the DB at the service layer. @annatuma Is that correct? If so, do we want further issues to propagate that info back into the data layer, or should it stay in the service DB only? If so, we won't be able to use the meta_data field for holding the NSFW tag. We'd need some other plan there.
I envisioned this as getting propagated back to the data layer through Kafka as described here; in my mind you would handle these events by setting the appropriate flag in the meta_data
field but you can represent it however you want.
An alternative design is to keep all logic related to mature content and DMCAs in the API layer in separate tables. We could perform mature content detection during ingestion (or in the future, a proper ETL pipeline). However, if you are able to detect that something is marked mature upstream (e.g. you find a way to get the NSFW flag from the Flickr API), you should still mark it in the meta_data
field (or wherever else) so we can flag those images during ingestion. This would remove all responsibility for handling DMCAs and flagging from the data layer. I'll take this approach instead if you don't have any objections.
Additionally, I share @mathemancer's concerns about trying to identify and minimize abuse of the reporting functionality. We should address this in the design. I like the idea of storing anonymized IP addresses, but we need to update the mockup to clearly let people know that we're storing their anonymized IP address.
We temporarily store IPs on our servers for rate limiting purposes and store them with the last octet erased in our server logs for up to 3 months; we should have the technical means to catch abuse already. Data retention practices are described in our privacy policy.
We're building API endpoints for this. Why do we want to run it every 12 hours instead of immediately? cc @aldenstpage
There's no technical reason for this to take 12 hours, particularly if we are able to avoid having to track NSFW status in both the catalog and API layers.
Long term, I'd want the flags to be in the data layer, since that would be part of any pipeline where we, say, used ML to figure out if other unflagged pictures might be mature content. Ideally, any metadata that was just about displaying the image would stay in the service layer, and any metadata that was 'inherent' to the image would end up in the data layer eventually. I think whether the image is considered 'Mature' or not falls under the latter.
But, we don't have a specific use for the 'Mature' flag in the data layer at the moment, so maybe I'm just falling prey to YAGNI fallacies.
Your last point is a good one; we can always extract the data later for training. I think it is also debatable whether matureness is inherent since it is highly contextual to culture and personal bias; we're taking a "stance" that an image is mature based on what we think our audience wants, and that's probably going to be a moving target. I'll stick with keeping it internal to the API layer now and we can bridge that to the catalog later if the need arises.
@brenoferreira we have the text to add to the first screen of the user reporting form regarding collection of IP addresses:
"For security purposes, CC collects and retains anonymized IP addresses of those who complete and submit this form."
@panchovm is updating the mockup right now. Thanks to @sarahpearson for review and language.
The mockup is now updated with this text
Everything is live in production and looks good.
We have a couple of minor follow-up issues (as is to be expected).
This is a meta ticket, for a feature that requires work across the frontend, catalog, and API.
[x] Takedown API Endpoint https://github.com/creativecommons/cccatalog-api/issues/294
[x] Report API endpoint https://github.com/creativecommons/cccatalog-api/issues/474
[x] API Filter https://github.com/creativecommons/cccatalog-api/issues/339
[x] Django Interface https://github.com/creativecommons/cccatalog-api/issues/473
[x] Process User Reported Content Queue https://github.com/creativecommons/cccatalog-frontend/issues/848
[x] Design of UI in Figma https://github.com/creativecommons/cccatalog-frontend/issues/851
[x] Frontend UI https://github.com/creativecommons/cccatalog-frontend/issues/425
[x] Frontend Filter https://github.com/creativecommons/cccatalog-frontend/issues/435
Overview & Concept
We’d like to introduce a feature to CC Search to allow users to report problematic content quickly and easily. This is important for several reasons:
This feature will require an addition to the CC Search UI for users to interact with, as well as backend support for recording user reports and taking action on those. It is important to note that nothing happens automatically to content when it is reported by a user. Any decision to hide or remove content from CC Search or mark it as NSFW in the Catalog will be made by CC Staff.
Frontend functionality
Reporting UI
Initial Wireframes Figma Mockup to base flow on
We will add a link to the top right corner of the Single Results Page, with a Flag icon, and the text “Report”.
Clicking on “Report” pops out a modal, which contains the following:
What is wrong with this {content type:image}?
These are the frontend interactions that happen for each of these selections:
“{content type:Image} infringes copyright.” In a new tab, load the DMCA form In modal, show Thank You message
“{content type:Image} is Mature Content.” In modal, show Thank You message
“Other.” New screen loads in modal, with: Header: “Please describe the issue for us”. Large text input field. [Submit Issue] button After user hits [Submit Issue], show Thank You message
Note: we’ll display {content type} where curly brackets are, based on the content type of the work on the page that the button renders on, on individual result pages. Ticket: https://github.com/creativecommons/cccatalog-frontend/issues/425
Filter
We want to add a filter, at the very bottom of the filter list, called “Search Settings”. It should have one option for users to select, called “Show Mature Content”. Ticket: https://github.com/creativecommons/cccatalog-frontend/issues/435
Backend process
Recording User Actions
These are the backend interactions that happen for each of the frontend user selections:
“{content type:Image} infringes copyright.” Add to table with “DMCA flag” and outclick indicator Currently, Legal takes actions based on response to DMCA form handled via email.
“{content type:Image} is Mature Content.” Add to table with “Mature” flag
“Other” and text submission: Add to table with issue description in field.
All reported content is stored in a table as soon as it is reported by a user. Two things happen when new content is added to the table:
Table Fields
This is the information we should store in the table:
Table Interface Options
After content is reviewed, we’d like to continue to store it in the table, so we have a record of all user reports. However, any content already reviewed and therefore marked with “no” should by default not be shown (i.e. be filtered out) from the Django admin interface. A backend user of the interface should be able to select it for inclusion, should they need to review prior actions.
The following actions should be available for each row in the table:
Mark Mature
De-Index Content
Do Nothing Clicking this only changes the status of the record from “no” (not yet reviewed) to “yes” (reviewed).
After any available action is taken, the status of the record is updated from “no” (not yet reviewed) to “yes” (reviewed).
Data Processing
As indicated in the Table Interface Options, data processing needs to take place based on the actions taken.
We need to write a script, set to run every 12 hours, that looks for rows where the status of the record has been set to “yes” since the script last ran, and where the action taken in the table is either “Mark Mature” or “De-Index Content”.
Any record with the action “Mark Mature” should: Have “Mature” added as metadata to the content record in the Catalog
Any record with the action “De-Index Content” should: Use the existing endpoint (https://github.com/creativecommons/cccatalog-api/issues/294) for copyright takedown.
The feature spec, including the Internal Process and Future Iterations, is available to CC Staff here.