Genre classification tags should be available for stories

hamle365 commented 10 years ago

Coloured genre classification tags should be available for stories to be added by users or mods e.g. Horror, Kids, Romance, Western, Spy, Thriller, NSFW, Erotica, Fanfic, etc.

MattHeard commented 10 years ago

I have plans for features called "tags" and "filters" which should be quite self-explanatory.

In true wiki-style, anybody will be able to add tags to any page, and filters will allow users to only be served pages matching or not-matching specified tags.

Bob has a daughter Alice who is 5 years old. Bob doesn't want Alice to read pages like 16a because it has the word "fuck" in one of the options. Page 16a is tagged with "profanity". Bob creates a filter for Alice's account with the rule "never:profanity", so when Alice visits page 16, version 16a will never be served to her.

Possible filter rules include:

never: Pages matching this filter will never be chosen.
always: Only pages matching this filter will ever be displayed. This would be useful if you want to restrict pages to a particular language such as "always:lang:en" to display only english pages.
prefer: This will significantly increase the likelihood that a page with the tag is included but will not prevent pages not matching to be displayed too.

Ideas are welcome.

hamle365 commented 10 years ago

This is a good idea. I agree with this plan. The tags should be colourful like gmail tags.

On 13/05/2014, at 12:38 pm, Matt notifications@github.com wrote:

I have plans for features called "tags" and "filters" which should be quite self-explanatory.

In true wiki-style, anybody will be able to add tags to any page, and filters will allow users to only be served pages matching or not-matching specified tags.

Bob has a daughter Alice who is 5 years old. Bob doesn't want Alice to read pages like 16a because it has the word "fuck" in one of the options. Page 16a is tagged with "profanity". Bob creates a filter for Alice's account with the rule "never:profanity", so when Alice visits page 16, version 16a will never be served to her.

Possible filter rules include:

never: Pages matching this filter will never be chosen. always: Only pages matching this filter will ever be displayed. This would be useful if you want to restrict pages to a particular language such as "always:lang:en" to display only english pages. prefer: This will significantly increase the likelihood that a page with the tag is included but will not prevent pages not matching to be displayed too. Ideas are welcome.

— Reply to this email directly or view it on GitHub.

MattHeard commented 10 years ago

Colourful tags would be nice but they will need to play well with all themes, or be themed themselves.

MattHeard commented 10 years ago

One of the central design principles of Dendrite is hands-off moderation of content.

There are two reasons for this:

Restrictive moderation of content will discourage authors from writing the stories that they actually want to tell.
Active moderation of content by Dendrite administrators will be time-consuming if not impossible.

Because of this, moderation of content should be as passive, automatic, and community-driven as possible.

Despite this, there is an urgent need for high-quality moderation because one of the best audiences for Dendrite is that of kids who are old enough to be (incredibly) creative writers but too young to be exposed to NSFW/mature content. (This assumes that young authors will only be allowed to write and read on Dendrite with the permission of extremely protective parents.)

It is easy to imagine the following scenario:

Alice is an 8 year old girl who wants to write a story about spaceships and aliens. She clicks on the "Dendrite" bookmark in the browser on her parents' computer and arrives at the front page and table of contents. The first story on the list is titled "YOUR MOM AND THE GIANT PENIS". Alice's father Bob walks into the room and looks at the computer screen. Bob is alarmed by the title of the story and forbids Alice from visiting Dendrite again.

How do we solve this problem without actively moderating every page?

One solution is to implement a blacklist of forbidden words and either censor the words or the entire pages. This is clearly too heavy-handed and the blacklist would need to constantly grow to match the inventive slang that kids use to circumvent word-level censors.

A better solution is to use tags to identify whether a page contains profanity or mature content. If a parent does not want their kids to read mature content, they could apply a filter to the accounts which prevents any pages tagged "mature" from ever being shown to the kid.

For example:

Alice is reading a page in a story about dinosaurs and cavemen and she clicks on a link to read about what happens when two dinosaurs see each other. The link leads to a page with two alternative versions. The original version of the page (100a, for example) describes, in vivid detail, the two dinosaurs having sex. The rewritten version of the page (100b) describes the two dinosaurs fighting and chasing each other. Because 100a is tagged as "mature", the version selection algorithm has only non-mature version to select from: it selects 100b. Alice reads about the dinosaurs fighting and continues unaware that the dinosaurs had sex in an alternate storyline.

If we think of the entire set of story pages in Dendrite, the filter preventing the display of mature pages effectively creates a subset of Dendrite. Any user with the filter on their account could read and write stories without being aware of the other, invisible pages.

There are a few issues with this tag system:

How do you determine whether a page is 'mature' or 'profane'?
What happens if an author with the filter writes something 'mature' or 'profane'?

What makes a page 'mature' or 'profane'?

I'll try to avoid getting philosophical about this, but there is clearly a difference of opinions among people (and parents) about what makes a particular page "mature", "profane", "NSFW", etc.

There are two risks involved with tagging a page with one of these tags:

If a page is not 'mature', but is tagged as 'mature', the author of that page is unduly censored and readers miss out on a potentially high-quality story.
If a page is 'mature', but is not tagged as 'mature', a young reader might see the page and be prevented from using Dendrite in the future.

Is it possible to tag all new pages as "possibly mature" and then have users confirm whether or not the page is or is not actually mature? If there was disagreement between readers, maybe the page could be tagged as "mature" if 50% or more of readers who review the tag agree that it is a "mature" page?

We also know that there are some strong indicators in the text which could automatically identify a page as being "mature" or "profane". If any words matched a list of "mature" words, like "penis", then maybe the page is automatically tagged as "mature", but if the page does not contain any words matching the list, it is not automatically tagged. This would likely be a lot more effective for a "profane" tag than for a "mature" tag, as it might be a lot easier to develop a list of words which identify a page as being profane.

Ultimately, I imagine a powerful automatic tag classifier analysing the content of the page to determine what tags are likely to apply. For example, the 10,000 most used words on Dendrite could be used as input for a learning algorithm which classifies a page with tags based on the words used in the text. For example, if "throbbing" was used more often than "wet" in mature pages, it would a be stronger indicator and would imply that a page containing the word "throbbing" is slightly more likely to be a "mature" page than a page containing the word "wet".

MattHeard / Dendrite

Genre classification tags should be available for stories #44