OperationCode / operationcode-pybot

Operation Code's Official Slackbot
Other
31 stars 45 forks source link

Auto-Reply Functionality for Unformatted Code #167

Open togakangaroo opened 4 years ago

togakangaroo commented 4 years ago

Problem:

We tend to have people posting often who don't know the three main ways to format code on slack. A bot to help them do this properly would save everyone some time and annoyance. Plus it should be fun to code.

This bot will activate when there is a post where a subset of the lines contains something that looks like code but is not formatted as such

So how to determine if something looks like code? Two possible approaches.

1) Heuristically. There are libraries out there that auto-detect language. Presumably some of the same processes can be used to detect whether something is code at all. Example of something that does this: highlight.js

2) ML. This is actually a pretty decent use case for something like a tensorflow classifier. And we could train it off of the actual operation code logs!

togakangaroo commented 4 years ago

I bet if you do it as a standalone bot, other slacks would love something like that. Good OSS project overall

LivingInSyn commented 4 years ago

Pygments might be a good fit since we're already mostly python:

https://pygments.org/docs/api/#pygments.lexers.guess_lexer

jasonappah commented 4 years ago

I'd like to give this a shot!

aaron-junot commented 4 years ago

Go for it @jasonappah! Happy to see what you come up with.

JudsonStevens commented 2 years ago

I think this could be an interesting project to add into the rewrite. Looking at this example for Discourse (code here) I think we could translate this over to Python to fit into the bot. Alternatively we could just use the project in Javascript and have it run next to the bot.

This is really a great opportunity for a Machine Learning project, but unfortunately for our situation I don't think that's the route we want to go. It would entail a good bit more infrastructure setup in order to have the model in production to be queryable - we could do something like train the model and just run it on the machine we use for the bot but even that would require a good bit more processing power/more powerful machine than we are currently using. I think the Regex based approach in the Discourse repository is the place to start.

JudsonStevens commented 2 years ago

Also, I tried Pygments and it was failing on some of the simpler examples I gave it, just detecting text - for example, this piece of text: I am using these codes before it .Embed_length = 25 model = Sequential() model.add(Embedding(vocab_size, Embed_length, input_length=1000)) model.add(SpatialDropout1D(0.2)) model.add(LSTM(10, dropout=0.5, recurrent_dropout=0.5)) parsed as <pygments.lexers.TextLexer>. I only tried it a couple times but it definitely seemed to default to text quite often.