let's talk taco twitter bots

dansinker commented 10 years ago

So @meetar built the taco_fancy twitter bot to tweet whenever a new file has been added to the repo, so people can see when something delicious has been added. YAY. But, the user experience isn't super great, since it tweets the commit message, not the actual title of the taco object added.

I'm thinking that @knowtheory's work in automation might make this twitter bot's messages more human-readable, by being able to simply tweet the title of the recipe, which he's deriving to add to the index. That and then a link to the recipe (I believe currently it links to the PR), would be amazing.

THEN, what if we were to use @evz's API to do one better: What if someone could tweet @ the tacobot and get a random taco recipe in return?

TACO ALL THE TWITTERS.

Also, I've managed to get my hands on the tacobot twitter account, which will replace taco_fancy once the right levers have been pulled over at the Big Bird HQ.

UPDATE: tacobot is now the valid twitter account for this effort.

meetar commented 10 years ago

I like it. That will also prevent all those charming but extraneous tweets (a file was moved or renamed, non-recipe file added, branches merged, etc).

I'll have yet to grok the automation and API but I should be able to get into those this week.

Also if anyone would like to peruse the bot code, an anonymized version is hosted here: github.com/meetar/taco-tweeter/

I welcome improvements.

evz commented 10 years ago

@sinker Love it. @meetar I'll take a look at your code in the coming days (I'm off work tomorrow so there's a good chance I'll have some time) and see what I can see. Are you opposed to turning that sucker into a Flask app? Should not be too difficult.

knowtheory commented 10 years ago

The automation is designed to be as stupid as possible (trying to forestall the tacobot ascendency over humankind).

I want scripts in the actual repo which read the current state of the repo, and build various index files.
I want a robot which listens for commits via webhook, compares the state of the repo before and after the push that's been received and does (one or more of) the following:
- rebuild and commit the index if there are any changes to the indexes
- tweet about changes to the repo

No. 2 there will require the bot to know some stuff about git. I was going to go basic and just do pull/push/status essentially. (Basically, when notified, the bot pulls from the repo, then runs the index scripts. If recipes have been added/removed/tagged w/in the repo, then the index script will update the indexes with new content, which git status will indicate. If there is new content to commit, the bot should commit it, and push the results).

If we add tweeting stuff to that, it'll happen at approximately the same time that the indexes are rebuilt. It could either happen via looking at the intermediate results of the indexing process, or it could look at the commit log, or some combination of the two.

dansinker commented 10 years ago

CONFIRMING: taco_fancy is now tacobot

knowtheory commented 10 years ago

ENGAGE!

ENGAGE

dansinker commented 10 years ago

WHOA.

meetar commented 10 years ago

@evz Not opposed to Flasking!

@knowtheory Yeah, I'd guess downloading and diffing the repo is probably the best bet. The GitHub json package that comes in the WebHook triger includes information about added and removed files, but it's a bit too vague for the kind of logic we're discussing – specifically, it doesn't include the path to the commit, just the name, and it can't identify renames.

So what do we all think about a master webhook that does the pulling and diffing, triggers the Cakefile on every commit, checks for new files in any subdirectory, and if it finds any pulls the title from the API and sends it to the tweetbot? (N.b. I haven't used Flask or Cake but they're just common nouns, how hard can they be)

startacos

evz commented 10 years ago

@sinker @meetar I just deployed the thingy that replies with a taco recipe when you mention @tacobot on twitter. It's on Heroku so I can add you guys as collaborators if you'd like.

evz commented 10 years ago

...or whoever. I'm open.

dansinker commented 10 years ago

WHOOOOOOO.

Is the code up on git?

dansinker commented 10 years ago

Eventually we need to set all this stuff up under a tacofancy github organization huh.

evz commented 10 years ago

yup: https://github.com/evz/random-taco-tweeter

evz commented 10 years ago

Yeah, this is getting that kind of complex.

dansinker commented 10 years ago

I just submitted a pull request to the bot. I actually have no idea if what I added actually works, because I'm an asshole.

On Tue, Nov 12, 2013 at 5:07 PM, Eric van Zanten notifications@github.comwrote:

Yeah, this is getting that kind of complex.

— Reply to this email directly or view it on GitHubhttps://github.com/sinker/tacofancy/issues/94#issuecomment-28347764 .

evz commented 10 years ago

I think I'm going to need to take this thing down for a few. It's apparently not really stable the way that it's running on heroku. I'll need to dig into the docs over there and see what I need to do to get it to run.

dansinker commented 10 years ago

I'm going to guess that @harrisj may have some advice regarding the maintenance and troubleshooting of twitterbots.

evz commented 10 years ago

OK, the key here is to not declare a web process type in the Procfile. Now it works!

evz commented 10 years ago

Dammit! Computers are hard. The app keeps falling over. Going to need some more debugging.

evz commented 10 years ago

Turned out to be a simple fix. Had to ensure that the more verbose ingredient names didn't blow the 140 character limit. @sinker @meetar You guys wanna give it a whirl and see if it's working for you?

harrisj commented 10 years ago

Hey, nice work here! I was just wondering if you were thinking of adding a Regexp test so that Tacobot will only send a recipe if you mention it at the beginning of the tweet (with an optional ^. for people who want to make their tacobot tweeting public). Right now, it tweets at anybody who mentions it, which might get you closer to being placed in twitter jail...

evz commented 10 years ago

Yeah, good point. I'll take a look at that tonight.

dansinker commented 10 years ago

Functionally, and just knowing how things spread on Twitter, this is especially important to have in place before this gets promoted widely because every time someone just says "hey, check out tacobot" they're getting spammed. Will dampen enthusiasm for something awesome. On Nov 13, 2013 8:08 AM, "Eric van Zanten" notifications@github.com wrote:

Yeah, good point. I'll take a look at that tonight.

— Reply to this email directly or view it on GitHubhttps://github.com/sinker/tacofancy/issues/94#issuecomment-28396574 .

harrisj commented 10 years ago

Random feature request: I kinda want randomtaco to look for ingredient matches in my question so I can eventually say "@tacobot what do I do with leftover chicken?" and it'll generate my taco around that... I know, it's a bit of a pipe dream, but I wonder how we could do it

dansinker commented 10 years ago

That's got @knowtheory's dream of entity extraction written all over it. On Nov 13, 2013 8:24 AM, "Jacob Harris" notifications@github.com wrote:

Random feature request: I kinda want randomtaco to look for ingredient matches in my question so I can eventually say "@tacobothttps://github.com/tacobotwhat do I do with leftover chicken?" and it'll generate my taco around that... I know, it's a bit of a pipe dream, but I wonder how we could do it

— Reply to this email directly or view it on GitHubhttps://github.com/sinker/tacofancy/issues/94#issuecomment-28397752 .

dansinker commented 10 years ago

Also, once Ted's tag field auto generates the vegetarian mark, would be super cool to return veggie tacos on request. On Nov 13, 2013 8:26 AM, "Daniel Sinker" dansinker@gmail.com wrote:

That's got @knowtheory's dream of entity extraction written all over it. On Nov 13, 2013 8:24 AM, "Jacob Harris" notifications@github.com wrote:

Random feature request: I kinda want randomtaco to look for ingredient matches in my question so I can eventually say "@tacobothttps://github.com/tacobotwhat do I do with leftover chicken?" and it'll generate my taco around that... I know, it's a bit of a pipe dream, but I wonder how we could do it

— Reply to this email directly or view it on GitHubhttps://github.com/sinker/tacofancy/issues/94#issuecomment-28397752 .

knowtheory commented 10 years ago

@harrisj yeah, i've got a script that reads through the repo right now and creates an inverted index of recipes by ingredient line.

The bit that's missing is the code that parses ingredient lines and coalesces them to say "1 lbs. chopped chicken breast" and "2 chicken breasts" are both "chicken" (or "chicken breast"). Right now it just essentially lists all the recipes by the full recipe line (which is less useful, unless you enjoy your browser's find function).

@sinker the scripts in #68 do generate the vegetarian mark! :) Right now all it does with the info is stick it into the table of contents. I'll need to check in on how tacobot works. Btw, we could also break up tacobot's capabilities into multiple smaller bots too. One to deal with the git notifications, one to respond to user requests (so long as all the data is getting munched up and handed out to the bots equally).

walterdavis commented 10 years ago

Maybe a dictionary of ingredient names (and misspellings) could form a sort of taco "stemmer" here?

Walter

On Nov 13, 2013, at 9:48 AM, Ted Han wrote:

The bit that's missing is the code that parses ingredient lines and coalesces them to say "1 lbs. chopped chicken breast" and "2 chicken breasts" are both "chicken" (or "chicken breast"). Right now it just essentially lists all the recipes by the full recipe line (which is less useful, unless you enjoy your browser's find function).

knowtheory commented 10 years ago

@walterdavis yep, that'll be either a component or a derivation of such a system. Ingredient lines are almost always semantically well formed and include the name of an ingredient, a quantity and a unit of measure. But the order in which those appear and the universe of possible measures and ingredients are the fun part. There are several recipes even in Tacofancy which included optional/additional instructions (e.g. "chicken breasts, chopped finely") or which use non-standard quantities/measures (e.g. "salt to taste") just as two quick examples.

All that said, i don't think it's as simple as stemming the ingredient lines. Distinguishing "chicken breast" from "chicken broth" is pretty important and that's structural info, not lexical info. In the end i don't think any system employed to do this is gonna be 100% perfect, so it's a matter of what's most convenient to operate, easiest to get started and maintain, and does the best job (probably in that order of priority).

imo it's going to end up being a parser, not a tagger, stemmer, or just a pile of regexps. ideally a parser that includes some information about the statistical likelihoods of individual words, and their similarity to existing ingredients (e.g. a spell check).

Anyway, that's one proposal. The even dumber proposal is just to enumerate all of the measures/units we possible can, and then throw a dictionary the rest, and see what can be recovered. :P

dansinker commented 10 years ago

+1 to breaking up the tacobot into smaller feature-specific bots. Reading through and thinking I'd list features and priorities as:

P1: regexp testing so that it responds to requests for tacos with a taco, not just @ replying willy nilly with tacos. P2: expanded vocabulary for taco requests, including vegetarian (fairly easy) and ingredient-specific (much harder)

to me feels like until we've got P1 squared, and Tacobot isn't just going bananas every time someone mentions its name, everything else is secondary

meetar commented 10 years ago

+.02

On Wed, Nov 13, 2013 at 12:23 PM, sinker notifications@github.com wrote:

+1 to breaking up the tacobot into smaller feature-specific bots. Reading through and thinking I'd list features and priorities as:

P1: regexp testing so that it responds to requests for tacos with a taco, not just @ replying willy nilly with tacos. P2: expanded vocabulary for taco requests, including vegetarian (fairly easy) and ingredient-specific (much harder)

to me feels like until we've got P1 squared, and Tacobot isn't just going bananas every time someone mentions its name, everything else is secondary

— Reply to this email directly or view it on GitHubhttps://github.com/sinker/tacofancy/issues/94#issuecomment-28414542 .

dansinker commented 10 years ago

Jesus, I am an idiot, and forgot the whole original purpose of the bot, which was to tweet out when a new taco object was added. OK. So I'd then say the feature list looks like this:

P1: regexp testing so that it responds to requests for tacos with a taco, not just @ replying willy nilly with tacos. P2: tweeting when taco objects are added, with title of object and link (vs current state, which tweets PR title) P3: expanded vocabulary for taco requests, including vegetarian (fairly easy) and ingredient-specific (much harder)

evz commented 10 years ago

OK, so I added a thing that checks the beginning of the text of the tweet for either '.@tacobot' or '@tacobot'. Check it out and see if it's working.

meetar commented 10 years ago

Looks great!

On Wed, Nov 13, 2013 at 1:01 PM, Eric van Zanten notifications@github.comwrote:

OK, so I added a thing that checks the beginning of the text of the tweet for either '.@tacobot https://github.com/tacobot' or '@tacobothttps://github.com/tacobot'. Check it out and see if it's working.

— Reply to this email directly or view it on GitHubhttps://github.com/sinker/tacofancy/issues/94#issuecomment-28417882 .

dansinker commented 10 years ago

That's definitely working.

Wonder though if this bot shouldn't get a little smarter. For instance: "@tacobot is amazing" shouldn't return a recipe, while "I want tacos for dinner, @tacobot help me" should.

Could we create an array of phrases that would return a taco recipe? I could actually see using something like a shared google spreadsheet that would then output a JSON file, so that multiple people could contribute phrasing. Just off the top of my head I can think of:

gimme a recipe I'm hungry help me recipe please taco me make me a taco taco it up

And that's the tip of the iceberg.

dansinker commented 10 years ago

OK, I went ahead and set this up because fuck it.

Spreadsheet link: https://docs.google.com/spreadsheet/ccc?key=0Anp-zgGKPxl7dEd2TUpzSWQxWDR4UWFuWWxRc2RHbUE&usp=sharing

YQL'd JSON link that I'm sure could be done in a better method: http://query.yahooapis.com/v1/public/yql?q=select%20col0%20from%20csv%20where%20url%3D'https%3A%2F%2Fspreadsheets.google.com%2Fpub%3Fkey%3D0Anp-zgGKPxl7dEd2TUpzSWQxWDR4UWFuWWxRc2RHbUE%26hl%3Den%26output%3Dcsv'&format=json&diagnostics=true

evz commented 10 years ago

OK, I added a thing to parse that doc and look for the phrases within: https://github.com/evz/random-taco-tweeter/blob/master/tacobot.py#L27-L33

Seems to work. Still checking for '@tacobot' and '.@tacobot' at the beginning. Should I drop that?

evz commented 10 years ago

Actually, to be more clear, if it starts with those strings or has any of the phrases on the spreadsheet in there, it'll tweet back

dansinker commented 10 years ago

BOOOOM.

So right now, if I hit 'taco me @tacobot' it will tweet me back, but 'what's up @tacobot' it won't. YAY.

But @tacobot What's up? returns a taco.

So I'm thinking the @tacobot .@tacobot gate should be removed. Seems like the key phrases are the thing we want to trigger the automatic tacoing.

evz commented 10 years ago

Gate removed. Should only respond when one of the designated phrases is present in the tweet.

dansinker commented 10 years ago

How often does it check the db?

On Wed, Nov 13, 2013 at 2:52 PM, Eric van Zanten notifications@github.comwrote:

Gate removed. Should only respond when one of the designated phrases is present in the tweet.

— Reply to this email directly or view it on GitHubhttps://github.com/sinker/tacofancy/issues/94#issuecomment-28432694 .

dansinker commented 10 years ago

Here's another kink: capitalization counts

https://twitter.com/janicedillard/status/400729059165884417

doesn't return a taco because the G is cap.

UPDATE: PR on lowercasing the tweets now in the tweeter repo.

evz commented 10 years ago

If by db you mean the list of OK phrases, it checks every time it gets a tweet. You're really going to make me break down and use a real regex in there aren't you? :wink:

dansinker commented 10 years ago

Hahahahaha.

I love the spreadsheet idea, because it opens up phrasing. Just wondering what will happen at scale. On Nov 13, 2013 3:17 PM, "Eric van Zanten" notifications@github.com wrote:

If by db you mean the list of OK phrases, it checks every time it gets a tweet. You're really going to make me break down and use a real regex in there aren't you? [image: :wink:]

— Reply to this email directly or view it on GitHubhttps://github.com/sinker/tacofancy/issues/94#issuecomment-28434589 .

evz commented 10 years ago

Yeah, true. Really, though, how many phrases are we talking here? Even with 100 or 1000 phrases you're probably only talking a second or two delay while it gets the list and then loops over and checks for a phrase in the text of the tweet.

dansinker commented 10 years ago

So I wonder why this didn't return a taco:

https://twitter.com/kategardiner/status/400753823989248000

Do you think when it reads the phrases non-escaped apostrophes fuck it up?

UPDATE: This one, similar phrasing and definitely with an apostrophe, went through: https://twitter.com/herdingbats/status/400764010192072704

dansinker commented 10 years ago

So I'm building up the vocabulary by looking at a search for "tacobot" and seeing when things come in that clearly should return a taco. This should build up fast.

One question for the group though: Should the bot always respond to someone, even if it's not with a recipe? Or something that helps to prompt towards a successful query?

example: Oh, hello! Are you trying to get a taco? Perhaps ask for one by saying "taco please".

That's going to introduce a lot of noise though. Maybe a silent failure is better?

It would be kind of fun to hide easter eggs in there, various phrases that would trigger something entirely different. Or like any mention of a GIF could return an animated gif.

MY BRAIN WON'T STOP.

evz commented 10 years ago

I think silence is better in this case. Easter eggs are good. Kinda moving into IRC bot territory.

@sinker your brilliant .lower() PR got me thinking. Seems like that approach (it simplifying the incoming text) might be an easier way to get this thing to work the way we want it to. Just turning everything into ASCII text on the fly (the incoming text and the phrases) would make string matching more accurate. Anyways, just a thought.

dansinker commented 10 years ago

yep. looking at the requests, everything is a variant. If there are ways of simplifying so that we simply don't have to list "taco please" "please, taco" "taco, please" etc etc etc, that'd be amazing. I smell regex.

dansinker commented 10 years ago

strangely, "I'm hungry" appears to still be a fail point. Do the logs give any indication as to why?

https://twitter.com/roknich/status/401007746708824064

didn't get a taco. He also tried it lower case right after, still no taco.

Wilto commented 10 years ago

Smart apostrophe, looks like? ' vs. ’

Might make sense to strip punctuation and lowercase the incoming text, prior to parsing it.

dansinker / tacofancy

let's talk taco twitter bots #94