geohci / edit-types

Edit diffs and type detection for Wikipedia
MIT License
12 stars 3 forks source link

An edit type taxonomy for detecting/evaluating contributions better than edit-counts – differentiating copyediting, wikilink additions, removals, and expansions (each with metadata such as text-length added) #73

Open prototyperspective opened 1 year ago

prototyperspective commented 1 year ago

First proposed this here: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_Edit_Types where you can find some relevant information.

Main edit types of that integrated or parallel taxonomy would include:

Here are some examples how this could be useful:

It's not only edit-types but in some sense also task-/activity-/contribution-types. Of course a caveat here is that this differentiated detection of edit/contribution-types should not incentivize nonconstructive editing such as removing lots of content or adding way more text than appropriate to an article for the sake of it (for these stats and/or badges etc), there are many ways this could be addressed and I don't think it would be problematic, just like having edit-counts doesn't create much bias whereby users opt to gain very high edit counts. It wouldn't be implemented in a fullblown way right away at first but only e.g. via some gadget and later in the default site but in very limited form and without any real-world benefits and so on, so this would evolve and get revised over time so any problems and potential problems are addressed, just like Wikipedia itself does (via the policies, templates, norms, scripts, activities, and so on that emerge there).


This comment may be relevant here: https://github.com/geohci/edit-types/issues/4#issuecomment-1012532894

Thanks for your work on the edit-types, I think the potential and usefulness is greatly underestimated and I hope more people work on this. It could greatly vitalize Wikipedia and catalyze a new level of constructive contributions and even the entire open movement in a way nearly nothing else can and so far I don't know of any similar approach or project.

I'll probably update and edit this issue over time as I get new ideas concerning it, and maybe somebody who sees the value of this even has the time to make some pullrequests that adds some initial version of this. I do see how there are more near-term and easier-to-implement this could be used for and that it would probably need more devs to help out for this to get built any time soon. Again, like metascience could improve science (efficiency, reliability, quality, human resources, usefulness, ...), this could increase Wikipedia editors, quality and engagement.

geohci commented 1 year ago

@prototyperspective thanks for these inputs and your patience with me putting together a response! I put together the start of something similar to what you're proposing but with some slight tweaks and I'd be curious to hear your thoughts:

revert for reverting other users' contents and/or removals

Yep -- this can't be determined via the library but in my initial analyses using it, I've incorporated our standard revert-detection approaches with edit metadata (edit tags; shasums) to separate out edits that either were reverted or were the revert. You can see that here in an exploratory analysis though I left patrolling/vandalism as a single category but in the future will separate them out: https://public.paws.wmcloud.org/User:Isaac_(WMF)/Edit%20Diffs/Example_Edit_Analysis_French.ipynb#Example-Analysis-of-~700-edits-from-French-Wikipedia-in-2022

newarticle for creation of new articles or turning redirects into new articles and so on

Creating new articles is best determined external from this library (because an empty parent revision could also be because someone blanked the page) but agreed that it's an important aspect to pay attention to. I haven't added functionality to detect redirects for similar reasons because it also feels best determined separate from the library. It's not impossible but it is slightly tricky: the mwparserfromhell dependency I use for parsing wikitext detects #REDIRECT as a # tag followed by REDIRECT text so I'd have to do some hacky things to check that they appear next to each other and on the first line.

talk for new talk page threads (and posts)

I haven't tweaked the library yet to handle talk pages well though that's on the list. All of it should work fine but I'd like to add explicit functionality for replies, links to user / project talk pages, and signatures. Otherwise, they'd just be tracked as lists, text formatting, and wikilinks.

  • copy-editing where contents were only slightly changed and/or ancillary changes like adding a reference, adding wikilinks to the existing text or adding subsection-headers were made
  • expansion for adding new content (not as fine-grained as distinguishing whether whitespaces or text has been changed but detecting whether or not whole new sentences were aded)
  • furthermore, it would for example detect how many "see also" wikilinks were added and how many categories were added; this would also be metadata

I broke these down into three similar but I think slightly different categories:

Here are some examples how this could be useful:

Thanks for listing all of these out! I'm working right now on getting it in place where it could potentially support these or others. The main challenge is having a pre-computed dataset of edits and their associated types so tools could easily be built around it with having to compute them themselves.