bot not auto-creating markov.json.zlib and structure-model.h5 in weights directory

veanome commented 5 years ago

When I start the bot I get it to the RUNNING state, however, I don't believe it is storing anything from my discord channel.

I noticed that when I check the weights directory it is empty. Whenever I place the test trump data that you provided in the weights folder, I get responses from the bot in discord. Should the python program be creating the markov.json.zlib and structure-model.h5 in the weights directory on start-up?

Images for reference:

capture

weightss

Thanks!

csvance commented 5 years ago

The current behavior is the weights are saved each startup after training (if and only if there was new data to train).

Question, in your discord.py configuration file have you set DISCORD_LEARN_FROM_ALL to True?

By default it is false and won't learn from Discord:

# Learn from all servers and channels
DISCORD_LEARN_FROM_ALL = True

veanome commented 5 years ago

Yes, I have DISCORD_LEARN_FROM_ALL set to "True".

dpy

Side question: Should I be running the --retrain-structure flag on every start-up? How often should I be restarting the bot as well?

Thanks again for responding!

csvance commented 5 years ago

If the bot is responding, the weights files must be saving somewhere. Which working directory are you running the bot from? I am guessing you will find the weights files in a path relative to that.

No you don't need to do --retrain-structure every startup. I found that data from discord often did not have a good impact on the structure the bot would output anyway (typically only one or a few words, such as "lol"). So the way I ran it I trained the structure model on imported data (say from a book or tweets), and then never trained it again so the bot would retain the structure of the original dataset and not degrade into one word responses, but still would learn vocabulary from discord.

veanome commented 5 years ago

The bot is not responding. However, the bot did respond when I placed your trump weights into the weights folder. Since I removed your trump weights, nothing seems to store in folder. I haven't changed anything within the bot besides adding the correct discord credentials and making changing "armchair_expert.py" into "arm.py"

csvance commented 5 years ago

I wouldn't worry about that exception for now (will take a look to see why its not shutting down cleanly though)

If you are missing the structure.h5 weights file, you should start the bot with --retrain-structure. For the bot to respond it needs both the structure.h5 and markov.json.zlib files in the weights folder. So do you have the markov.json.zlib file there?

You could also run with the --retrain-markov flag which will force retraining the markov chain which should output it to the weights folder relative to your working directory (which should be the root of the git repository)

veanome commented 5 years ago

Yeah, when running the bot or restarting the bot, both structure.h5 and markov.json.zlib are not showing in the weights folder, or anywhere. The only time I had structure-model.h5 and markov.json.zlib in the weights folder is from when I used the Trump tweets dropbox link you provided in an earlier ticket.

Should I be using those trump tweets structure-model.h5 and markov.json.zlib files in my main weights folder all the time? When I have those two in my weights folder the bot does respond (only to trump related statements though, understandably lol).

csvance commented 5 years ago

Can you paste the full output of the bot starting up (where I can see the command and current working directory as well)

veanome commented 5 years ago

Sure thing!

running

csvance commented 5 years ago

Ok, that's looks good. Can you do the same thing but with --retrain-markov argument?

I am looking to see if there is any data being stored in the discord training database.

veanome commented 5 years ago

I also included a side shot of my weights folder

runningmarkov

csvance commented 5 years ago

Ok, there looks like there may be some sort of bug where either Discord messages are not being saved to the training database, or the training procedure is not training with them for whatever reason.

If you are familiar with SQLite, you could open up the discord.db file and take a look at the discordmessage table and see if there is anything there.

I should have time to take a look at it tomorrow, but I suspect maybe some upstream change in the discord.py or sqlalchemy libraries has somehow broken how this worked (or I broke it, because its been a long time since I used Discord as a learning data source TBH)

veanome commented 5 years ago

I was figuring it might have been some upstream or storing issue, because when looking at the databases last night, I noticed nothing was being populated - but I know just very basic SQL and wasn't sure on which end the issue was on lol.

I went ahead and took another look at the discord.db and specifically the discordmessage table: results are empty.

discorddb

I do appreciate you helping me through this today and taking time tomorrow to take a closer look!

csvance commented 5 years ago

I can confirm I see the behavior on my install as well. Debugging now.

csvance commented 5 years ago

Ok, that was embarrassing. Somewhere in refactoring a while back I forgot to commit the messages to the database after inserting to them!

Should be fixed in 38734163acbf4f4e2bef99df741e77d2b90c7454, let me know if it works for you!

veanome commented 5 years ago

Haha great! I will go ahead and give it a shot. Should I run it without any flags?

csvance commented 5 years ago

No flags needed, but you will need to collect data for a while if you are only going to use discord as a source (joining it to a bunch of servers works well)

If you want to speed things along, using a script like https://github.com/csvance/armchair-expert/blob/master/scripts/import_text_file.py will work wonders. Just feed it a newline separated text file, such as a book.

veanome commented 5 years ago

Went ahead and ran it without any flags, sent it some messages, check the discord.db and I do now see it populating! Went ahead and ran restart of the bot with the --retrain-markov flag and I do now see the weights folder being populated as well! Thank you so much!

I will also attempt to use the import_text_file.py again - I had some issues with it last night but I think that may have been just (L)user error lol. When I want to import a text file using "import_text_file.py"do I place the txt file in the scripts directory or under a different directory? (Have any recommendations of a file I should use?)

csvance commented 5 years ago

I need to improve the documentation / fix the script to run in the scripts directory.

Copy the python script to the git root and then invoke it, ie

cd armchair-expert
cp scripts/import_text_file.py .
python import_text_file.py "Principia Discordia.txt"

Principia Discordia.txt

csvance commented 5 years ago

Heres some example output when trained with that:

veanome commented 5 years ago

Went ahead and tried to run imported_text_file.py. It returned that I was missing the "storage" module so I went ahead a pip'd that. Ran again and said I was missing "storage.imported" module, so I went to pip that and it error'd out.

error

csvance commented 5 years ago

You have to copy it to the base directory and run it from there (stupid I know, I need to fix it) See the above commands I posted.

veanome commented 5 years ago

Oh yup! I was literally just about to edit that saying this was user error! I forgot to move it down to the git root. That seemed to make it work!

csvance commented 5 years ago

Honestly there's quite a few things I need to fix and document to make the bot be more user friendly, so I wouldn't really call it user error.

I am going to get back to studying for my Computer Organization & Architecture final now, hopefully you should be on your way to getting some initial results! I should be able to check up on this again tomorrow.

veanome commented 5 years ago

So far so good! Thanks again for taking the time to debug this today. I know how finals go... It's always bittersweet around this time of the year lol. Good luck in your studying!

csvance commented 5 years ago

Hey, just a quick update on some improvements I made today.

Learns new words in real-time again. So you can say words it has not learned yet and ask the bot what it thinks about the new words, for instance. Before you had to restart the bot for the training to take effect, now you don't have to.
Implemented an easily tune-able "temperature" system to get much better results for reply generation. You can control how "crazy" the bot is so to speak, in either vocabulary, sentence structure, or both.
Guesses how much time it needs to spend doing structure training based on the amount of data you have (although I still question how well it will work on very small datasets, say < 10-20k lines or so)

You will need to add this to your config/ml.py:

# Lower values make things more predictable, higher ones more random
STRUCTURE_MODEL_TEMPERATURE = 0.7
MARKOV_MODEL_TEMPERATURE = 0.7

I have gotten good results with 0.7 and 1.0, but you can experiment with other values if you want.

veanome commented 5 years ago

Awesome! I went ahead and made the same changes last night so I'll be interested in seeing how it does now

csvance / armchair-expert

bot not auto-creating markov.json.zlib and structure-model.h5 in weights directory #42