edgi-govdata-archiving / web-monitoring

Documentation and project-wide issues for the Website Monitoring project (a.k.a. "Scanner")
Creative Commons Attribution Share Alike 4.0 International
105 stars 17 forks source link

Discussion on Content Moderation #29

Closed ChaiBapchya closed 5 years ago

ChaiBapchya commented 7 years ago

From @ChaiBapchya on March 26, 2017 16:12

While searching for tools similar to Versionista, this is what I found. I stumbled across a different (from Tracking / Monitoring changes) yet (I think a bit relevant) term - Content Moderation

As we talk about meaningful changes, tracking changes, finding diffs and prioritizing them, one of my searches led me to AWS Marketplace products such as WebPurify Image Moderation, etc which moderate web content / traffic.

Drawing analogy to our use-case, all we are doing is actually moderating the changes. Under this ambit, also include -

  1. Smart Moderation Based on ML, NLP + self-learning like Human API Documentation

  2. Implio Automated (ML based) + Manual content moderation + Ability to write filters Choosing between ML Generic/Custom-made and Filters

  3. IOSquare Monitoring, Automated Analysis and Visualization IoSquare merged with Besedo

On the flipside

Why automation can never replace human-content-moderation! It talks about how human intervention is critical when it comes to :-

What do you make of this @dcwalk @b5 @danielballan ? Have you heard about it before? Do you find this of any use? Anything worth picking up / learning from?

Copied from original issue: edgi-govdata-archiving/overview#106

ChaiBapchya commented 7 years ago

From @danielballan on March 27, 2017 12:6

Yes, I think there's a useful analogy here.

ChaiBapchya commented 7 years ago

From @danielballan on March 27, 2017 12:26

Having just caught up on https://github.com/edgi-govdata-archiving/web-monitoring-processing/issues/28 it looks like there are strongly analogies in other prior art, like analyzing Wikipedia edit history.

ChaiBapchya commented 7 years ago

From @dcwalk on March 27, 2017 21:15

Awesome, I agree with @danielballan that there are strong analogies to this and monitoring and it seems like we should definitely be paying attention to how projects are working on moderation to help us think through features of our web monitoring system!

We tend to use the issues in this "overview" repo for meta-organizing across all our projects, but as I interpret this as being about the web monitoring project directly, it likely belongs in there. Are you able to move this issue to https://github.com/edgi-govdata-archiving/web-monitoring @ChaiBapchya ?

ChaiBapchya commented 7 years ago

I'm glad this finding is relevant and useful.

Alright sure. Will move it to web-monitoring. @dcwalk

mhucka commented 7 years ago

I may have the wrong impression about the intended goal of this discussion; I assume this is stillr related to change detection, but if that's not it, ignore me!

@ChaiBapchya That was a very interesting find. I think you're right that this is related to the task at hand, but I am not convinced it is exactly the same. It appears that these content moderation systems are designed to detect undesirable content in (for example) comment sections on blogs or social media, but not necessarily compare different versions of the same document. Thus, while some inspiration could be drawn from content moderation approaches, I think it's unlikely that the same tooling will work for evaluating version changes.

ChaiBapchya commented 7 years ago

Yes, detecting undesirable content is the aim of content moderation systems (that's majorly because most changes and glaring things happening on Social media revolve around spam / malicious content). It doesn't completely suffice our purpose. However, changes in the content of environmental websites isn't spam-free or malice-free. So, it does make sense to incorporate such a mechanism to tackle these problems. @mhucka What do you reckon?

mhucka commented 7 years ago

Hi @ChaiBapchya – I'm not clear on the sense in which you meant the previous comment, so I will try to answer it in the two ways that I can interpret it.

If you mean to use the moderation approaches to look at spam in comment sections on the web pages or in social media discussions, then there may be a small misconception: the kinds of websites that this project is monitoring are typically government agency websites, so they do not have spam in the normal sense, and I don't think we are trying to track social media discussions at all. With respect to the web pages, I don't believe the pages even have comment sections (or at least, the ones I have seen did not). @dcwalk or @danielballan maybe could confirm or deny this. As to malice, well, that is a more philosophical question! The people who have been nominated to lead the agencies today are (from our perspective) arguably full of malice, but they probably don't see it that way. More to the point, while the changes those people force upon their agencies may be malicious from our perspective, it's not in the same way that (e.g.) random commentators on websites or social media might be malicious.

Now, perhaps you mean instead that the evaluation used in the moderation systems could be adapted and used to judge the severity of changes between two versions of a web page? In that case, yes, that's an interesting and creative idea. I think I don't know enough about the details of how the moderation systems work, so I feel uncertain about being able to give more guidance here, but on the face of it, it seems to me like it's a possibility. To say anything more, I would need to see a description or example of how those systems work. Is there any open-source system like this? Or research papers describing methods used in these systems?

ChaiBapchya commented 7 years ago

@mhucka Yes, I meant it from second perspective. Adapting their work to suit our needs.

titaniumbones commented 7 years ago

(posted to old issue by accident ,sorry. reposting/modifying here). @mhucka I have not seen any important pages that allow comments so your intuition above is, I think, correct, and they are not a feature of sites we are monitorying. @danielballan @ChaiBapchya does anyone on this thread actually know where the wikipedia content moderation code is? It has some advantages over comment moderation code in that it does at least deal with changes to web pages.

danielballan commented 7 years ago

My first guess is that this is somewhere in the WikiMedia codebase.

mhucka commented 7 years ago

@titaniumbones The closest I've been able to come is https://www.mediawiki.org/wiki/Extension:Moderation

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.