Module API - Githubissues

StarlitGhost commented 10 years ago

For the actual module loading system, we have a few options.

twisted.plugins, which was recommended by @ElementalAlchemist and looks pretty clean and nice
the same mechanism we're all using in our current individual bots. we know it fairly well, but it's somewhat messy and there's no real API enforcement as it is
probably others, subclasses comes to mind

Other than that, we need to discuss what we want modules to be able to do, and what they need to be able to access. A few things I would need for, say, Alias and Chain:

access to the module handler itself, specifically the list of loaded modules and their triggers
ability to call those modules directly
calling them in a thread so the bot continues to function while a long chain might be processing

Ideally modules would be able to register themselves in a few different places (possibly only one place each though)

pre-process - before standard commands and other things execute. Can't think of any examples right now. Maybe stripping format characters, or pre-following urls and replacing redirects. I dunno.
passive - logging, picking up hangout links, that sort of thing
active - urlfollowing, sed regex patterns, etc
commands - directly called commands
post-process - apply to messages returned from modules. automatically post messages over a certain length to a text paste site, strip colours from output if that channel mode is set, logging the bot's own messages
a utility slot perhaps? I'm imagining this as a module that provides functionality to other modules. ignore lists, admin lists, that sort of thing.

But anyway, I'm making this an issue because we've gone a little while without all being together and willing to talk about it. This will let us discuss it when we each have some time.

StarlitGhost commented 10 years ago

pre and post processes would modify the messages directly before and after other modules get their hands on them, none of the others would modify the incoming or outgoing messages after they've been created.

In Chain I have some modules adding to an 'extraVars' dictionary as the messages go through each module, so that these variables can be applied to later modules, but this does not modify the basic contents of the message. See Delay in PyMoronBot, which adds 'delay' and 'delayString' to that dict, and the processing of this in Chain.

HubbeKing commented 10 years ago

Twisted.plugin does look rather neat. I'm just a bit iffy on how exactly you write the Interface for it, but that's probably my inexperience showing it's face. (I assume this is where we'd need to do the threading aswell?)

Your ideas for different types of modules/plugins sounds very sensible aswell, sort of what @Heufneutje is doing over in PyHeufyBot.

And if we give modules access to the module handler, we could probably fairly easily replicate Alias and Chain-type behavior.

HubbeKing commented 10 years ago

@MatthewCox has pointed out that threading would be done in the module call mechanism. I am just a bit of a derp.

StarlitGhost commented 10 years ago

A cleaner way to do the Chain extraVars stuff would be via the module handler, rather than polluting the message class.

Alternatively it could be one of those 'utility' modules I suggested. A 'temp data store' module perhaps, that clears itself after a message has been fully processed.

I suppose that implies reacting to events...

StarlitGhost commented 10 years ago

It would be a rare occurrence but it might be nice to have modules that have their own plugin structure underneath too. The only example I have right now is URLFollow in PyMoronBot, which is fairly large and would be nicer if it was split up. This is possibly unrelated to the module-level API discussion though.

Heufneutje commented 10 years ago

I vote for twisted.plugin. EA uses it in txircd 0.3 and it looks like a pretty neat system if we can get it to work.

- access to the module handler itself, specifically the list of loaded modules and their triggers
- ability to call those modules directly

This should be if we're passing a bot instance. The way PyHeufyBot handles it, a module handler is assigned to every server. We could probably do the same thing here.

Other than that, not sure about the pre/post processing. This is mainly because I think module responses are a very pointless thing though. Especially if you're passing a bot instance which you can use to send messages already, having a separate response is kinda silly. Not to mention you limit yourself to only sending messages after everything in the module is done.

Another thing I'd like to add to this is priorities. For example, logging should have priority over everything else. Ignore should have a higher priority than normal commands, but lower than logging, etc.

StarlitGhost commented 10 years ago

True, but that does make something like chain impossible. Unless we can somehow have it still intercepting all those responses from the bot. On 3 Jun 2014 07:38, "Stefan" notifications@github.com wrote:

I vote for twisted.plugin. EA uses it in txircd 0.3 and it looks like a pretty neat system if we can get it to work.

access to the module handler itself, specifically the list of loaded modules and their triggers ability to call those modules directly This should be easy since we're passing a bot instance. And the bot instance has a module handler attached to it.

Other than that, not sure about the pre/post processing. This is mainly because I think module responses are a very pointless thing though. Especially if you're passing a bot instance which you can use to send messages already, having a separate response is kinda silly. Not to mention you limit yourself to only sending messages after everything in the module is done.

— Reply to this email directly or view it on GitHub https://github.com/DesertBot/DesertBot/issues/8#issuecomment-44925016.

Heufneutje commented 10 years ago

I don't really have an immediate solution, but I don't wanna limit us by having to return a response. Also because my modules rely on a bool being passed so the call process can be interrupted. Ignore makes use of this for example. Without this you would need to make it part of the module handler, which I don't really wanna do.

Of course we could always use an array and pass both.

EDIT: Another issue I have with using responses is that not every module has to return something. Passive modules don't respond to anything.

StarlitGhost commented 10 years ago

I have many modules that don't return anything, or return lots of things... but I agree, a response class isn't strictly needed.

HubbeKing commented 10 years ago

While a response class isn't necessary, I do think it would make things easier for when a module DOES return something. If we make modules just return strings, we have to modify them somewhere so they can be sent anyway, and if we make the modules send stuff directly, it's harder for others (who might not be as knowledgeable about IRC as we are) to write them properly.

And also ignore halting the calling process is a bit wonky IMO. Ignoring people is way easier to do in the module handler, by just not calling modules if the user is ignored. (One could use the ignore module for adding/removing/loading/saving ignores instead, if one really wants ignores to be modular)

ElementalAlchemist commented 10 years ago

I have some things to say. It's 1AM so I won't get to all of them now, but if you don't hear more from me after this comment, poke me at some point.

Twisted.plugin does look rather neat. I'm just a bit iffy on how exactly you write the Interface for it, but that's probably my inexperience showing it's face. (I assume this is where we'd need to do the threading aswell?)

You shouldn't need to do threading in Twisted.

Basically, for twisted.plugin, you write a class that extends zope.interface.Interface with some variables and dummy functions that you want your main class to have. (Something like IModuleData in here.) When you write your module, you specify the main class that will be loaded first to implement twisted.plugin.IPlugin and your interface class (using a zope.interface.implements call in the class definition, for example--note that in txircd, I also have a base class that implements most of IModuleData that modules extend, mostly for my benefit as a module author). IPlugin is required for twisted.plugin to pick it up. You'll pass your interface class to twisted.plugin.getPlugins when you load the modules, and the getPlugins function will only pick up classes that implement that interface. (getPlugins loads all modules, though, and you filter out the ones you don't want by not saving any of the data for them.) A final point on module implementation: the class has to be instantiated in your module file, but that's as simple as having a genericVariableName = MyMainModuleClass() at the end of your module file. If you forget to do this, however, it simply just won't load.

A few things I would need for, say, Alias and Chain: [blah blah list here]

A list of loaded modules could be beneficial. If you design your API around communication, though, it may not be as beneficial as you may think. A modules dictionary (mapping name -> implementation class (i.e. the class implementing IModuleData)) is exposed to modules through the IRCd object, but basically the only use case for it being exposed is for a module implementing a command that lists loaded modules. (The other reason is convenience, because Python doesn't have private class variables, making it hard to hide things.)

For the triggers: The whole point (or at least part of the point if we're doing runtime-loadable modules) is to prevent code from having side effects on other code as much as possible. This means that you want your modules to expose as few of their internals to other modules as possible, and, when they have to interface, the interface is made as explicit as possible.

To that end (until I'm otherwise shown why it won't be a good fit), I continue to be a proponent of txircd's action system. It fits very well with Python's somewhat more freestyle nature, and it's very flexible while still being strict in its purpose: it provides an explicit interface and framework for core/module and module/module communication, but it does not define exactly how the interaction must take place. By that I mean this: it provides a way for modules to register functions that perform specified actions (I use string keys and reserve them for certain kinds of functions), but the action system itself does not say what parameters those functions should take or what return values it may have. (It actually supports a few kinds of actions to encourage variety.) Chaining is also built in: modules specify a priority value when the core queries them for action handlers. (If it doesn't fit universally, actions don't have to be the entire interface, but I've found them quite helpful, so give it a shot.)

Most of the time, if you're doing something that takes sufficiently long, you're probably using a library that will return a Deferred. If you use something like deferToThread, I don't think you need to make that a core function--modules can handle that.

It would be a rare occurrence but it might be nice to have modules that have their own plugin structure underneath too.

That would be a huge pain. You can use actions for this, too. For example:

DesertBot receives a message with a URL
The URLFollow module eventually receives that message
URLFollow parses the URL out and calls an action called, for example, "urlhandle" with the URL and whatever other parameters you think you need (I doubt more are needed for URLFollow, but I haven't looked at what all you're doing with the internals). I imagine it gets called with something like txircd's runActionUntilTrue, which stops when a function returns True, so I'll use that example--substitute the return value with whatever you end up using for your own implementation.
A function in the Twitch module, which the Twitch module registered for the "urlhandle" action, either handles the function and returns True (if it's a twitch.tv URL), or returns None and lets it fall through (in this case, it's not a twitch.tv URL).
If it fell through, the function the YouTube module registered for the "urlhandle" action gets called with the URL and does similar processing to the Twitch module's function above.
URLFollow also registered a function to the "urlhandle" action with a lower priority (so that it always gets called last and functions as the default implementation), and it handles the URL.
The functions on the stack return down to the module manager which probably has more modules to ask about the message commands we're in the middle of processing, and it continues on its way with that.

I just got this far and realized that by Chain, you meant PyMoronBot's chain module. In my defense, it's now 1:30 AM and I'm not going back to edit it. That said, since Chain works with commands, I would expect that your command handling module (that is a module, yes?) in a system like this to expose some actions or action handlers allowing you to just spontaneously trigger commands.

This should be if we're passing a bot instance. The way PyHeufyBot handles it, a module handler is assigned to every server. We could probably do the same thing here.

I'm not sure why you need more than one module manager. You're only going to have one reactor, anyway.

For example, logging should have priority over everything else. Ignore should have a higher priority than normal commands, but lower than logging, etc.

I think I mentioned priority once, but I didn't read this far yet. Either way, A+ on allowing modules to assign priorities for anything the manager is already going to need to store in a list anyway. (I'm also all about flexibility.)

response class blah blah blah

Maybe it's because I've not worked with IRCClient in Twisted, but I'm not sure what's with the need for a response class. Can you not send messages directly to the server? I'd imagine as part of the module API, modules get passed a reference to the server handler or something when you load them (similar to how txircd calls hookIRCd passing the IRCd instance to every module before it does anything else with that module).

And also ignore halting the calling process is a bit wonky IMO. Ignoring people is way easier to do in the module handler, by just not calling modules if the user is ignored. (One could use the ignore module for adding/removing/loading/saving ignores instead, if one really wants ignores to be modular)

I did this in RoBoBo by making the message functions have a return value; m_ignore would simply return MSG_IGNORE for any message it didn't want you to see, and the module manager would immediately stop processing the message. Python is less restrictive than C++, so maybe you can think of something else? But if nothing else, that's a way to go.

If you're passing message data to all modules in a dictionary like it looks like you're doing, you could also just call messageData.clear() to empty the dictionary and have the module manager go, "Oh, hey, there's nothing in this dictionary anymore. I guess I should stop."

That probably isn't everything I have to say, and I probably missed reading some things, but it's my sleepytimes. Let me know your thoughts about my thoughts.

Heufneutje commented 10 years ago

I may add stuff later to this since I'm currently at work, but all these ideas sound pretty good. twisted.plugin is something we'll have to read up on, but I think I kinda have a general understanding of how it works now.

I also agree now that having more than one module handler is kinda silly. Especially if we're using twisted.plugin. We just have to keep track of the module data and keep track of which modules are used on which server.

The actions system in txircd is something I'm still trying to wrap my head around, but it does sound pretty powerful.

Another thing I really like is your view on passing data around and using a clear() function. I think we can probably update IRCMessage with this, or subclass it. This would be very useful for modules like Chain and Ignore.

I think the only thing I don't really agree with is having the command handler in a module. That is taking modularity a bit too far for my liking. The module handler can take care of that.

ElementalAlchemist commented 10 years ago

Another thing I really like is your view on passing data around and using a clear() function. I think we can probably update IRCMessage with this, or subclass it. This would be very useful for modules like Chain and Ignore.

I was thinking of dicts at the time I said that, but it could also work with a class.

I think the only thing I don't really agree with is having the command handler in a module. That is taking modularity a bit too far for my liking. The module handler can take care of that.
Command handling is fairly central to an IRC bot

I wasn't sure how to respond to the first thing by itself, so I scrolled up in IRC and copied that second thing.

If you know me, you know I'm going to argue that there's no such thing as taking modularity too far. :P
Either way: Command handling is not as central to an IRC bot as you think it is. The purpose of an IRC bot is to facilitate communication with the server and to process messages. Parsing out and handling commands is one way in which messages are processed, similarly to how messages are also processed by having URLs parsed out and reported on (URLFollow: Another way to process messages (official tagline)). Like this, it only makes sense that the command handler, which processes messages in a certain way, is a module with which other modules interact.

Heufneutje commented 10 years ago

To me commands are just messages with a special character at the start of them. A command can be a type of module, and as such I'm of the opinion that the module handler can deal with it. A module having to communicate with another module every time a command is sent seems kinda overexcessive to me.

For reference: https://github.com/Heufneutje/PyHeufyBot/blob/master/pyheufybot/modulehandler.py#L125

Heufneutje commented 10 years ago

...Whoops. Didn't mean to close that.

ElementalAlchemist commented 10 years ago

One of the other things I like about modularity is the separation of functionality into different modules, which is difficult to do when the core does things for you. Under the system mentioned above, I imagine a command handler module calling a "handlecommand-{}".format(command) action passing the parameters (and maybe a fallback "handlecommand" action passing the command name and parameters, just in case), and modules implementing commands just hook into that system instead of adding another API for commands.

I'm not going to say that you're definitely not allowed to implement commands as a separate API or whatever you're trying to do, but do a lot of thinking about the API you're trying to build and see if you can fit it into your API without having to extend it. A simpler API is much easier to work with than a complex one. If you think a lot about it and decide you really need the command API, go for it--but really think and make sure it's the right decision before you commit to it.

HubbeKing commented 10 years ago

Okay so since discussion has effectively halted here, I think we need to start making some actual decisions. We've all sortof quietly agreed that twisted.plugin is the way forward, but not much else.

We need to start figuring out how we're gonna structure everything, so that we can eventually start building the interface and handler classes.

Heufneutje commented 10 years ago

Yeah, I think we can start implementing all the twisted.plugin stuff and make sure we know how it works.

HubbeKing commented 10 years ago

The first things that come to mind are what exactly we're going to do inside the interface class, and what twisted.plugin gives us natively. After that, there's also what things the module handler is going to do, and what's going to be left to the individual modules.

ElementalAlchemist commented 10 years ago

what twisted.plugin gives us natively

You get a way to load modules. That's basically it. It does some fancy things to be good at loading modules, but you shouldn't expect it to be more than a module loader.

After that, there's also what things the module handler is going to do, and what's going to be left to the individual modules.

This should be figured out first, and then what goes into the interface class should fall into place based on the API you've mostly designed.

Heufneutje commented 10 years ago

So to make some progress with this, I'm gonna do a proposal.

Use twisted.plugin for module loading. As there's not really a way to not load everything at once, we'll need to have our loadModule and unloadModule functions handle the data that's used.
There will be a single module API for all servers. Its task is to pass messages to the modules. It will also keep track of what module data is loaded per server.
Modules have priorities. I propose we go with an enum that has HIGH, ABOVENORMAL, NORMAL, BELOWNORMAL and LOW.
Modules trigger on different message types and should have a list of all message types they trigger on.
As far as module types go, I propose we go with the ones that @MatthewCox opened this issue with. Though I don't see the need for pre-process modules. So that leaves us with Passive, Active, Command, Post-process and Utility. The API will know how to handle these different types.
IRCMessage instances should have an option for their data to be cleared. This way the API will know when to stop passing it. (Useful for modules like Ignore)
IRCMessage will have a command prefix in it as @HubbeKing and @MatthewCox proposed. (See https://github.com/DesertBot/DesertBot/commit/bb9f7b44f86c332da2d90cbc29cc8ed96603fa22 for reference). A reference to the bot instance will be passed to do this.

There might be more things that need to be added here, but for now this should give us something to start with.

HubbeKing commented 10 years ago

That all sounds good to me, I believe we decided on passing a bot reference to the IRCMessage class in IRC, so there's no record of that on here. (So IRCMessage gets the command prefix and bot nick by accessing bot.commandChar and bot.nickname)

There's also the matter of how module responses should be handled, which we have not yet decided.

I'm cool with both ways, an IRCResponse class is not strictly necessary, but how do we do the Chain module if modules send responses directly via the bot rather than returning a response to be sent?

Heufneutje commented 10 years ago

I think rather than returning an IRCResponse class we could return an array or a dictionary with data.

HubbeKing commented 10 years ago

Preferably a dictionary then, so one could easily see what's being accessed (response["channel"] for instance, rather than having to go response[3])

Heufneutje commented 10 years ago

Agreed. Dictionaries it is I say.

HubbeKing commented 10 years ago

So the module handler would do something like... take in an IRCMessage, pass it through modules, and the modules would add stuff to a response dict inside the IRCMessage.

And then the handler would send the stuff in this response dict. Which would contain the lines to be sent, where to send them, stuff like that.

I'm liking this idea.

Heufneutje commented 10 years ago

That's the idea. Data can also be modified or cleared. An ignore module for example could clear the data and the API would know to stop passing it.

HubbeKing commented 10 years ago

Having modules return a dict, and then adding that dict to the IRCMessage object that caused it in the module handler would make for a neater implementation on the module side, though. The ignore module would then fire first, clearing out all the IRCMessage data. The API would then know "oh okay, this is no longer a thing." and not try creating more responses.

HubbeKing commented 10 years ago

For now, though, just returning a dict with what's being sent, who/what caused it, how it's being sent, and where it's being sent, should be enough for the modules.

The handler can then just send it, or we can go for some kind of linking them to the IRCMessages that caused them later.

HubbeKing commented 10 years ago

Okay so we decided to go for a strict Response class, for the added checks and stuff that an init method gives us over a dict.

ElementalAlchemist commented 10 years ago

If you're using a class for messages, then depending on what it's used for after module processing, it may be neater to have an ignore flag the module handler checks.

To keep your data consistent, though, you should probably ignore the ignore flag unless it's a PRIVMSG or NOTICE command. I'm not sure how you'd disallow clearing for other commands.

HubbeKing commented 10 years ago

Would perhaps be neater to have a flag, yeah. I don't see an occurance other than ignore where we would need to completely clear out a message, so either way we go this is a modification fairly central to the core that will only really be used by one module.

On the topic of modules, should be allow modules to have multiple triggers? Right now it's set up so that trigger is a string, that supposedly supports regex. I think it would be better to let it be a list, so that you could have multiple simple words, multiple small regexes or just one large regex.

Becomes more flexible while retaining the functionality we've already planned for.

Heufneutje commented 10 years ago

Sounds good to me. I think a list of regex triggers is indeed neater than a single one.

Heufneutje commented 10 years ago

From IRC: http://paste.ee/p/z4TVY

To summarize: live reloading is supported. Unloading modules is as simple as removing them from the list of loaded modules and it will automatically be cleaned up. Twisted's getPlugins will always give us a list of the latest versions of our modules.

HubbeKing commented 10 years ago

How are we gonna handle PostProcess modules? PyMoronBot uses a separate interface, since PostProcess modules work on IRCResponse objects rather than IRCMessage objects, and we would have to do something similar here.

HubbeKing commented 10 years ago

Most of the API issues have been resolved and I think we're almost ready to start documenting and porting modules over, but there is still the question of whether or not IRCResponse objects should have a user.

StarlitGhost commented 10 years ago

I believe all of the issues we were discussing in here have been decided on, but feel free to re-open if you see something we missed.

DesertBot / DesertBot-old

Module API #8