online memory - Githubissues

rlemon commented 11 years ago

Currently (as you are all well aware) the bots memory is stored in localStorage. To move her online the only thing I see needing to happen is to have her memory stored online (somewhere, and I am offering to host this) as well as a thin API to update it, and retrieve it. This api can be extended to do any of the additional server functions we are looking for. The bot for the time being can then still live in any of the maintainers browsers and just call into the cloud to get her information. I think this would also be the most futureproof was to keep the bot up and running (correct me if I am wrong but hasn't the previous openID methods all been broken by site updates?)

What does everyone think?

eternalruler commented 11 years ago

Host MongoDB and store the JSON in the database.

SomeKittens commented 11 years ago

You can see a first attempt at that here: https://github.com/SomeKittens/SO-ChatBot/tree/cheesyNode

ralt commented 11 years ago

The most important question: how are you going to handle the authentication? A simple couchdb server is easy enough to open to the public, but authentication is probably the most important aspect.

Shmiddty commented 11 years ago

To start, it should be limited by IP. Only approved ip addresses will be able to access the api. Obviously this comes with the inherent issue of dynamic ips, but it should require only very little maintenance.

rlemon commented 11 years ago

@Ralt the auth will still be done by hand. This isn't 'the bot' on the server. This is the bots memory. She will make ajax calls to the api on the server from a client browser.

@Shmiddty My idea was to have a security key being passed with the bot. This would be handed out to all maintainers so they can access the API. This would also mean if we allowed other instances of the bot to use the API we can track and facilitate them as well.

SomeKittens commented 11 years ago

I like the idea of a security key, allowing for many different bots to be run off of one API memory. We can also switch features (don't want welcome? We can configure that!)

Shmiddty commented 11 years ago

What happens if the security key gets into the wrong hands?

Zirak commented 11 years ago

Where do you store the security key? How do you keep it private?

And assuming all's fine and dandy, we should have a fallback to localStorage or something of the sorts.

SomeKittens commented 11 years ago

This is basically cloud services for Caprica. Only memory and node-specific plugins would be stored there. Anything important (say, Twitter) would be locked down to our key only. Others could use their keys to store memory. Key could be stored in DB/textfile/wherever.

If it gets into 'the wrong hands,' so what? They have a malicious memory store? It's not like you can store arbitrary JSON.

ralt commented 11 years ago

Here is my proposal:

The authentication is done through a key that will be sent to us by mail. The key is sent as a header, and the server checks it.

On the bot side:

When the bot connects with an empty localStorage, it gets the data from the server memory.
Then, the bot regularly (every 1 hour?) overrides the server memory with its new localStorage.

This way, the bot doesn't suffer from network latency, and not much code changes on its side.

On the server side:

A simple JSON string (mapping the localStorage) is stored. Only modification of this string is possible.
Before overriding its memory, it saves a copy of the current memory. It serves as a backup, and since our authentication is bare, it allows us to easily rollback if something malicious/stupid happened.

It allows us to keep a simple authentication scheme, and to easily rollback if something bad happens since we'll have snapshots of every hour. It's not like we're Google, and we can easily afford to have missing localStorage for some hours if a bad boy finds our key.

Thoughts?

SomeKittens commented 11 years ago

That'll be nice for memory backup, but it misses out on a lot of the power node gives us (namely being able to call functions in npm, say the bot tweeting starred items).

rlemon commented 11 years ago

well for starters we can just get a memory backup online. That was my initial intent. Once we get the bots code modified to support online storage we can pretty much modify how we store the memory however we want so long as we return it in the same way. The server side api can be expanded from there. Again I really have no preference on how we do this, I am willing to offer up my CentOS VPS for the job and we can install pretty much anything you want.

shea-sollars commented 11 years ago

Would there be any harm in making it read-only for everyone?

allquixotic commented 10 years ago

If we allowed the entire bot to run in NodeJS, wouldn't it be possible to just use a database API for Node on the local box to serve as memory? People really shouldn't be hosting their bots on a desktop somewhere in a browser, anyway; the Root Access bot is hosted on my dedicated server (physical server rental in a datacenter) which is a perfectly suitable environment to keep the memory. Right now the memory is essentially the contents of a Firefox profile. No AJAX / cross-domain / public internet / security / authentication problems if the box hosting the bot is the bot that the memory is stored on.

If you guys are going to have some centralized server acting as memory for the JavaScript chatroom bot (or multiple bots from different rooms), please don't turn this on by default in the SO-ChatBot tree, as I don't want the Root Access bot's memory to be stored on any server besides my own. Thanks.

allquixotic commented 10 years ago

Nevermind RE: NodeJS. I didn't realize that was basically a pipe dream that is nowhere near being done. I could actually make use of the Mongo JSON thing if I hosted it on localhost, and made it only listen on localhost. Wouldn't even need any authentication :)

caub commented 10 years ago

A free mongolab db could handle it, if it's not that big For the API, I have done a small project for direct binding with mongo queries, it looks unsafe, but normally it's not: here are some demo: http://jsbin.com/epudowe/11/edit http://jsbin.com/UmUbipa/6 the idea is to have the db accessible from browser, with the necessary security rules, I've used openid, must use it I'd say since the db is opened

the project is done in java, but there are less than 100 lines of code, so in nodeJS it's also fine, I hope it's relevant

FirstWhack commented 10 years ago

This needs to be revisited. I personally don't think this requires or justifies using a DB, the entire storage space will be what, <2,000 chars? Even if it expands it doesn't need to go in a database, it's one line. Databases are best for separation of data IMHO, not for storing data in general.

Secondly, some room's don't want the memory persisted? @allquixotic could you please give us reasoning behind that? It sounds like someone with a car getting 10mpg saying "but I don't want to get 20mpg" (only in the way that the clear improvement is being rejected without reason). I'm sure you have a reason, I don't doubt that, I just would like to know why and whether or not it affects any other versions of the bot (AFAIK there's no sensitive information (And I know for sure their shouldn't be if there is!)).

Going a step further: Who wants the bot to run on it's own server? I don't think that idea makes much sense honestly, but maybe there are some good points for it?

allquixotic commented 10 years ago

@Jhawins

I personally don't think this requires or justifies using a DB, the entire storage space will be what, <2,000 chars? Even if it expands it doesn't need to go in a database, it's one line. Databases are best for separation of data IMHO, not for storing data in general.

I agree that this does not "require or justify using a DB".

I agree that databases are best for separation of data, not for storing data in general. Our HTML 5 storage of JSON is more than adequate for the scale of data the bot uses.

Secondly, some room's don't want the memory persisted? @allquixotic could you please give us reasoning behind that?

I never said that I don't want the memory persisted. I said that I'd rather have it be persisted on my own infrastructure, rather than on some random stranger's infrastructure. I just think it's a very bad design to hard-code a specific server's domain name/IP/URL into the bot code, because if that server goes offline for any reason, poof, the bot either breaks, or some significant part of its functionality becomes suddenly unavailable. I would no sooner inflict my own server's URLs into the code than I would want someone else's. For command plugins (like !!meme) it is probably fine to make external requests, because it's very easy to disable a plugin, and even if the code is broken for a plugin, it won't bring down the bot if users run it. Online memory is the kind of feature that would belong in core, and likely break the bot completely if the online memory server were not available.

I believe that the bot's code on github should be orthogonal to, and decoupled from, any particular implementation of the bot or its infrastructure, to the greatest extent possible. This is my primary argument against hard-coding, e.g., rlemon's server into the bot code as "the online memory repository for all bots". We should simply make the feature available; then make the server-side code available (in a separate github repository, if needed) for the code that exposes a "Online Memory Server", and then allow bot implementors (that is, people who wish to start up a new bot) to decide which server they want to connect to for online persistence, or even to allow them to opt-out of it totally.

(AFAIK there's no sensitive information (And I know for sure their shouldn't be if there is!)).

No, there is no sensitive information in the bot, in any incarnation of the bot I'm aware of -- not mine, not Danny's, and not rlemon's. "There's sensitive information in there!" has never been a justification espoused by me or (I think) anyone, for why not to use online memory, or why to not hard-code a URL so it's hosted on some random guy's central server for all people who download the SO-Chatbot source code.

It sounds like someone with a car getting 10mpg saying "but I don't want to get 20mpg" (only in the way that the clear improvement is being rejected without reason). I'm sure you have a reason, I don't doubt that,

I am not against the concept of online memory. I just think that the code should require the bot administrator (the guy installing and running the bot) to manually specify the server host/domain/URL where the online memory repository resides.

Going a step further: Who wants the bot to run on it's own server? I don't think that idea makes much sense honestly, but maybe there are some good points for it?

What do you mean by "its own"? Do you mean running it on a server all by itself with no other programs running on it? Or do you mean running it on dedicated infrastructure?

If you mean running it on a server all by itself: I agree that this is not necessary, because the bot consumes so little resources that dedicating an entire physical computer (or even a VM) is a bit overkill.

If you mean running it on dedicated infrastructure (which is defined as a server sitting in an enterprise-grade datacenter connected to enterprise-grade hosting with a very high uptime percentage, as opposed to the alternative of sitting on some guy's laptop on a wifi network in their basement on a flaky ADSL connection), I very strongly believe that any 24/7 channel bot should run on some kind of dedicated infrastructure. That doesn't mean it has to be on a physical OS install; a VM would be fine. But a VM on a physical box that's somewhere in a datacenter, with redundant power systems, multi-homed tier 1 connections, and all that other lovely stuff that comes with it. It comes down to how reliable you want your channel's bot to be, and something that's reliable is lower maintenance for the system administrator, and becomes more of a "fixture" in the channel, as opposed to a bot running on some guy's flaky ADSL, which might have unpredictable and/or extended outages. But, importantly, if we decouple the code from specific implementations of the bot, it will be up to individual bot administrators to decide whether they want the bot to be hosted on anything from a HA cluster, to a low-end enterprise box, to a Pentium 2 laptop in Antarctica connected to the internet by dialup modem. The issue of where each bot implementation is hosted is, and should remain, completely orthogonal to discussions related to code changes in the SO-ChatBot repository.

It seems like, up until recently, or perhaps even still, the code for /Zirak/SO-ChatBot was written largely, or entirely, with the intention of directly supporting, specifically, Caprica Six in the JavaScript chatroom on chat.SO. This is fine, and to be expected, as this was the first bot, and the natural tendency of the developers of this repository has been to support the needs of this bot.

But this bot is now popular, and has several separate implementations. It is up to the core maintainers (Zirak, Jhawins, Somekittens, and anyone else who has more than a handful of commits) to decide whether they want to keep the bot more "generally useful" so that other people can deploy their own bots on their own computers, or if they want to specifically direct the code to satisfy the immediate needs of "Caprica Six".

The two are not necessarily mutually exclusive goals, though. You can still support every single need of Caprica Six while not excluding or making life difficult for other bot implementors. It takes a bit more design thought, perhaps the introduction of an (optional) configuration file, and perhaps some more generalized code to do it, but in the end, the bot will be better-designed and more future-proofed by taking a farsighted view as opposed to going for the shortest path to the goal of enhancing Caprica Six specifically. That is the philosophy I am trying to impart on you.

Of course, since I have very few commits and issues to my name in this repo, and assuming the default GitHub decision-making structure of meritocracy, my contributions are not meritorious enough for me to dictate to you guys to do anything. So if you decide to disregard my comments, that is fine with me. I already maintain a fork of this repo with about 30 diffs from Zirak master.

I was mostly trying to get out of some extra work for myself by arguing in favor of not hard-coding the online memory server URL, because otherwise I'd have to have one more diff against master for my own server's online memory URL.

And please, please, whatever you do, open source the code behind the online memory server, even if you decide to hard-code the URL to it in the Zirak/SO-ChatBot codebase.

Zirak commented 10 years ago

@allquixotic Don't worry, I've been thinking about these things ever since you said you use merge diffs. I'm playing a bit with some things, decoupling will come.

rlemon commented 6 years ago

now that we have online backups, is this still worth the effort?

Zirak commented 6 years ago

@rlemon Probably not. In headless mode "local" means "remote" anyway, and with backups it's gucci.

Zirak / SO-ChatBot

online memory #118