LunarWatcher / NN-chatbot

A chatbot with a neural network, which can be used in the console, on Discord or in the Stack Exchange network chat.
Apache License 2.0
11 stars 2 forks source link

Make it so that users can create their own vocab for the bot. #4

Closed FreezePhoenix closed 4 years ago

FreezePhoenix commented 6 years ago

I think it would be useful if users could make their own vocab the first time before they make saves for the bot. This would require 2 things:

The first one can be solved by SO-chatbot by Zirak, combined with the modification of the console.log function. The second one can be done by opening a new window and using document.write.

FreezePhoenix commented 6 years ago

@LunarWatcher do you think it would be better written in JS or Python? I think JS because it would run directly in the browser.

LunarWatcher commented 6 years ago

ATM, Javascript, like the JVM languages, PHP, and a bunch of other languages don't have ML libraries that come near TensorFlow. The Python port is the most extensive one. Lua would be another option (it has a library I can't remember the name of, but since I don't know Lua and TF is still the currently most comprehensive base-ML lib) Python and TF is the best option. Plus, Python and TF has other libraries (i.e. Keras, Tensorlayer (this project uses Tensorlayer), etc) that build on top of TF to add another layer of abstraction and save time by not having to create the layers. Whether or not the bot can run in the browser is of 0 interest. I could create a Flask backend for web events, but there would still be python running in the background which kinda defeats the purpose. I'm not going to re-write the code to allow bookmarklet use. Create a batch script that executes a command that runs the script. I.e.

python <DRIVE>:/<path to>/bot.py

I can add the necessary input as CLI arguments in addition to stdin (as it currently is), but that's as far as I'm gonna go. And finally, custom vocabs are going to be supported with a currently unpushed edit, where the vocab size 45000 is used, which allows slightly above 7000 custom words in the vocab. Requires training data which I will write a scrapper for later (scrapper here refers to collecting conversations, filtering them manually and making it possible to use that to train the bot). Owners are hard-code atm, ranks are added last time I debugged it in chat, chatroom posts are sent through ChatExchange so there's no reason to have console input to send messages (unrelated: the process of creating a system that handles which site to send to is really hard to create. Since there's the site, room and message there are three vars, discord is more or less impossible due to the channel IDs being defined by long. And there's not really any benefit to manually sending messages from the account anyways, so it's not a priority.

TL;DR: Python will be used because JS has no good ML libraries that come close to TF. Bookmarklets aren't going to be supported ATM due to a lacking server callback (Flask) for a potential web frontend. Either way, it runs on python in the background (Flask is basically a python server) so it just adds more memory usage. Custom vocabs are prepared to be pushed atm, but I'm working on figuring out memory issues for low-memory GPU's (optimizing for 2 gigs of VRAM, but maintaining full GPU support is hard)

FreezePhoenix commented 6 years ago

I don't think you understand. What I am saying is that JS could create a file that creates output similar to the original Dialog files. It would not run a bot under JS, just create a file ready for download.

LunarWatcher commented 6 years ago

No it can't. The original input has a changed format, and so does the output. #1 fixes the issues with output once it's fixed, which is going to be soon. The (programming) language doesn't matter

FreezePhoenix commented 6 years ago

I have an idea. Modify the save before it loads to include new vocab.

LunarWatcher commented 6 years ago

The vocab alone is USELESS. You can have a vocab to include every single word in the English (or any other language) and it'll not be enough. The network has to be trained to use them, meaning the issue is importing custom training data for the bot to train on. It'll add words if they're not found, assuming there's still space in the dict. The bot wouldn't be able to use the other words in the vocab because it wouldn't know how to use them.

FreezePhoenix commented 6 years ago

Won't know how to use them as in it doesn't have experience with the word? Is there any way to run a speed-training? As in no connection to the network + only in console?

LunarWatcher commented 6 years ago

The net doesn't need network for anything but installs. So yeah, you can disconnect from the network when you have the necessary packages.

And yes, it won't know how and when to use the word. Introducing new training data will most likely increase loss for a few epochs too, but it'll be able to adapt a lot

LunarWatcher commented 6 years ago

Grammar checking exists

FreezePhoenix commented 6 years ago

cough I'm impressed. You made a language parser with regex.

And these folks said it would be impossible: https://meta.stackoverflow.com/questions/362943/auto-capitalization-of-the-noun-i-on-post-submit

Or at least insanely difficult.

LunarWatcher commented 6 years ago

It is insanely difficult. My system will not get everything right. Nor will the Magic Editor (unrelated userscript for SE). But it works enough to get some level of sensibility in the output

FreezePhoenix commented 6 years ago

So, could a method be used to set it to a random training and then remove that random training after the first epoch? So it would start out with usage, and then get feedback, and then remove the fake usage and replace it with that feedback.

LunarWatcher commented 6 years ago

There's no feedback on output atm, the closest you get is filtering future training data before teaching it

FreezePhoenix commented 6 years ago

So how does it change and adapt? How does it know whats right or wrong? I'm sorry for incorrect terminology. I mean the results or the save that is produced by shutting down or an epoch

LunarWatcher commented 6 years ago

It doesn't train while it's being chatted with at the moment

FreezePhoenix commented 6 years ago

So you have to train it in the console? Well, I have yet to actually manage to install Tensorflow. It's actually quite difficult.

LunarWatcher commented 6 years ago

For now, yes. I'm trying to figure out continuous training, but it's not very well documented

FreezePhoenix commented 6 years ago

All you should need to do is use the appropriate methods. import os all you need to do is use the os interface to log apropriate commands. In fact, this could be used to make a firstinstall.py that would install all dependincies of the Chatbot. 😄

FreezePhoenix commented 6 years ago
import os
os.system(command)
# Where ¨command¨ is a string representing a command

documentation: https://docs.python.org/2/library/os.html#os.system

Better yet: It works on Linux & Windows

FreezePhoenix commented 6 years ago

https://github.com/LunarWatcher/NN-chatbot#known-issues

Is there any way to change the dimensions of a save?