Zirak / SO-ChatBot

Other
124 stars 60 forks source link

Handling unformatted code #238

Closed rlemon closed 9 years ago

rlemon commented 9 years ago

Opening this up for discussion because I don't think it would be unreasonable for Cap to detect unformatted code in the chats and: a) warn the user b) migrate it to the bin and warn the user c) migrate to the bin and mock the user d) migrate to the bin and post a formatted version on behalf of the user

There should also be a throttle on it, because often even regulars will post, edit quickly, ctrl + k, send again. maybe 10 seconds?

These are all just random ideas. What does everyone think?

honnza commented 9 years ago

If you fin a reliable way to detect unformatted code, I'm in favor of trashing it and warning the user. We can do the mocking ourselves.

AmaanC commented 9 years ago

I agree with @honnza. I don't have many ideas for how you'd reliably detect unformatted code, though

towc commented 9 years ago

maybe a command that roomowners could use to format other people's code automatically? Now that we have the clipboard accessible, we could add an extension that allows everyone to copy the ID of every message by clicking a button on that message. So the ROs can call the command, and copy paste the ID without too much hastle, and the bot'll do the rest. No need for automatic recognition which could fail, this way

CS1000 commented 9 years ago

{ howto: Letter frequency, especially special chars vs. alnum

action: warn (actually notify and teach about Ctrl+K) later if message is still unformatted bin it }

Unihedro commented 9 years ago

Come up with specific rules for what defines "unformatted code" and it's possible to convert the rules into a regex. After a short chat conversation, it's hinted that "Both { and } occuring in multiline messages means the message needs work." Please discourse on more criterias.

ralt commented 9 years ago

I know that php codersniffer detects code in comments, so it's definitely possible.

rlemon commented 9 years ago

Right now I don't think we need a formal solution, this is meant to be a discussion whether or not this feature would be useful. We can as a group discuss the criteria for message migration, _after_ we decide if it is useful or wanted.

honnza commented 9 years ago

can codersniffer be reverse engineered?

ralt commented 9 years ago

I'd just eval and check for syntax error...

honnza commented 9 years ago

"definitely" for "useful"

awalgarg commented 9 years ago

@Ralt how would you eval css? :P

towc commented 9 years ago

@awalGarg this is the JS room, nobody who doesn't know what he's doing should post css

Unihedro commented 9 years ago

I support this idea because it's not like there could be any harm arising from helpful guidances leading to formatting code. Feel free to request ownership of my code dump room if you'd like to help moderate it.

honnza commented 9 years ago

sure we want leave javascript code with syntax errors unformatted? Also, code in other languages? -1 on eval

awalgarg commented 9 years ago

mhmm, what about code by newbies having syntax errors which they need help in fixing? @towc

towc commented 9 years ago

@rlemon I am totally for it. Just one addition: have the bot remind the user how to format code

honnza commented 9 years ago

better yet, redirect them to jsfiddle

towc commented 9 years ago

@awalGarg that's why I suggest having ROs do the validations. Not to automate the process, just to make it a lot easier for them

honnza commented 9 years ago

In fact, all we need is a bot command to trash chat posts

awalgarg commented 9 years ago

This is really a very broad topic. Can we break it down into smaller issues?

honnza commented 9 years ago

My suggestion is to start with a bin command. Auto-trigger can come next.

rlemon commented 9 years ago

@awalGarg this entire discussion is about whether it is a good idea. not the implementation of said idea. Everyone just took it a step further.

honnza commented 9 years ago

It is a good idea. I doubt anyone disagrees that

awalgarg commented 9 years ago

Yeah, as honnza said, we all agree it is a good idea since we have a known problem which needs a solution. Being programmers, we naturally started looking for an implementation ;)

rlemon commented 9 years ago

I'm mostly waiting for the sleepy heads to wake up and chime in. Zirak, otherBotRunners, etc.

benjamingr commented 9 years ago

Detection sounds simple for 90% of cases.

Detect $(", function() <div or .controller(" and trash those to a "please post formatted code" room. I can write a simple ML that'd detect code more reliably or we can use an existing library but honestly it's super overkill.

Zirak commented 9 years ago

Sounds good. The thought passed my head a few times, but the genie always told me detecting the 100% was too difficult.

90% is good enough. I'll dig through the transcript and try to come up with something.

Zirak commented 9 years ago

After some fooling around, here's what I came up with:

function isUnformattedCode (text) {
    var lines = text.split('\n');
    if (lines.length < 4) {
        return false;
    }

    var codeyLine = /^\}$|\}$|^<\//;
    return lines.some(/ /.test.bind(codeyLine));
}

Searched the transcript for !!format, ran that against all messages in the time block, saw that it agreed with them and caught some more. Most importantly, miraculously it's yet to provide me a false positive, tested against today's and yesterday's chat history.

Methinks the algo should look something like this:

By "teach" I mean a message like "Please don't post unformatted code - use Ctrl+K before sending (hit up to edit messages). See the FAQ [faq link]".

If the user sent a long message (>10 lines), it'll also have a "or use a paste service like [links]".

Thoughts?

AmaanC commented 9 years ago

Neat. The rules sound good to me. When will the maid be implementing this? On May 27, 2015 12:37 AM, "Zirak" notifications@github.com wrote:

After some fooling around, here's what I came up with:

function isUnformattedCode (text) { var lines = text.split('\n'); if (lines.length < 4) { return false; }

var codeyLine = /^\}$|\}$|^<\//;
return lines.some(/ /.test.bind(codeyLine));

}

Searched the transcript for !!format, ran that against all messages in the time block, saw that it agreed with them and caught some more. Most importantly, miraculously it's yet to provide me a false positive, tested against today's and yesterday's chat history.

Methinks the algo should look something like this:

  • Ignore if user is an owner/mod
  • Bin and teach <2k users
  • Teach >=2k users

By "teach" I mean a message like "Please don't post unformatted code - use Ctrl+K before sending (hit up to edit messages). See the FAQ [faq link]".

If the user sent a long message (>10 lines), it'll also have a "or use a paste service like [links]".

Thoughts?

— Reply to this email directly or view it on GitHub https://github.com/Zirak/SO-ChatBot/issues/238#issuecomment-105636914.

FirstWhack commented 9 years ago

I know I'm late but I'd just like to chime in with my opinion:

Don't do the bin/teach cutoff at 2K, that's ridiculous. It needs to be much much lower, I have ~1.3K rep and I'm a very knowledgeable person.

I had typed out why we shouldn't do this at all (seriously guys, binning unformatted code automatically why do we even need room owners these days just make Caprica automatically kick people too) but I'm going to let it go and suggest a sensible "smart user" rep level.

Zirak commented 9 years ago

@AmaanC I'll be home this weekend, will try and take a stab at it.

@Jhawins

Don't do the bin/teach cutoff at 2K

Not set in stone, we can take it back to 1k (which is also /welcome's lower threshold), but most regulars do have more than 2k.

why do we even need room owners these [if we have features like these]

"That's a room owner's job" isn't a reason to not implement this. Lacking this task, you won't find our room owners bored; we're room owners, not people who hunt down unformatted messages and lecture users on the basic etiquette of chat.

Binning unformatted messages and correcting people is one of the menial things you have to do to maintain a normal conversation. Why not automate it? It's a mechanical process, there's nearly no thought behind it, it's repetitive, and it's annoying. I don't do it as much as I used to because of these reasons.

just make Caprica automatically kick people too

I'd love to. Boy oh boy would I love to. Imagine not having to deal with help vampires. Imagine not having to deal with spammers or bigots. Wouldn't it be great? Wouldn't it be awesome if some automatic process took care of the mindless things, and left the more serious stuff to us?

awalgarg commented 9 years ago

image

SO Magic!

FirstWhack commented 9 years ago

I appreciate you replying to everything but you know I have no defense lol.

Rep level though I needs adjusting still. We are a "rep != knowledge" community so 2K rep is not a decent "teachable user" level. On May 26, 2015 4:48 PM, "Zirak" notifications@github.com wrote:

@AmaanC https://github.com/AmaanC I'll be home this weekend, will try and take a stab at it.

@Jhawins https://github.com/Jhawins

Don't do the bin/teach cutoff at 2K Not set in stone, we can take it back to 1k (which is also /welcome's lower threshold), but most regulars do have more than 2k.

why do we even need room owners these [if we have features like these] "That's a room owner's job" isn't a reason to not implement this. Lacking this task, you won't find our room owners do without things to do in the long room; we're room owners, not people who hunt down unformatted messages and lecture users on the basic etiquette of chat.

Binning unformatted messages and correcting people is one of the menial things you have to do to maintain a normal conversation. Why not automate it? It's a mechanic process, there's nearly no thought behind it, it's repetitive, and it's annoying. I don't do it as much as I used to because of these reasons.

just make Caprica automatically kick people too I'd love to. Boy oh boy would I love to. Imagine not having to deal with help vampires. Imagine not having to deal with spammers or bigots. Wouldn't it be great? Wouldn't it be awesome if some automatic process took care of the mindless things, and left the more serious stuff to us?

— Reply to this email directly or view it on GitHub https://github.com/Zirak/SO-ChatBot/issues/238#issuecomment-105676918.

Zirak commented 9 years ago

@awalGarg Sadly (AFAICT) that's serverside SO magic.

@Jhawins

Rep level though I needs adjusting still.

Sure, what do you think will be better? 1k as in /welcome?

rlemon commented 9 years ago

I say remove the rep limit fully and implement a throttle. I have X seconds to edit the message and format it before the bot bitches at me.

Zirak commented 9 years ago

That'll be in there anyway.

gtomitsuka commented 9 years ago

Maybe too heavy or unsupported, but might be relevant: https://github.com/tj/node-language-classifier It uses the deprecated classifier internally. I didn't check how it handles unsupported languages.

Shmiddty commented 9 years ago

+1 for Zirak

Zirak commented 9 years ago

@gtomitsuka That seems to assume the input is a programming language, when we want to determine whether it is one. Obviously the dumb regexp above won't match a slew of languages, but it seems to get that 90%.

Zirak commented 9 years ago

Timeout is 10 seconds, messages will be binned to Trash Can, rep threshold is 2k (due to lack of better suggestion)

rlemon commented 9 years ago

@Zirak, the limit should be three lines and not four. Sorry I don't feel like this needs a new issue (considering how fresh the feature is). re-open if you agree, otherwise I'll start a new issue.

SomeKittens commented 9 years ago

(otherwise, good work, you made us proud, etc)

Zirak commented 9 years ago

Let's give this a couple more days and revisit if it gives too many false negatives?

rlemon commented 9 years ago

@Zirak this is no longer working. Should we reopen this or start a new issue?

http://chat.stackoverflow.com/transcript/message/24729564#24729564

Bot sees every line as a new message.

http://i.stack.imgur.com/l9BIE.png

not sure if that is expected or not.