Closed veanome closed 1 year ago
The current behavior is the weights are saved each startup after training (if and only if there was new data to train).
Question, in your discord.py configuration file have you set DISCORD_LEARN_FROM_ALL to True?
By default it is false and won't learn from Discord:
# Learn from all servers and channels
DISCORD_LEARN_FROM_ALL = True
Yes, I have DISCORD_LEARN_FROM_ALL set to "True".
Side question: Should I be running the --retrain-structure flag on every start-up? How often should I be restarting the bot as well?
Thanks again for responding!
If the bot is responding, the weights files must be saving somewhere. Which working directory are you running the bot from? I am guessing you will find the weights files in a path relative to that.
No you don't need to do --retrain-structure every startup. I found that data from discord often did not have a good impact on the structure the bot would output anyway (typically only one or a few words, such as "lol"). So the way I ran it I trained the structure model on imported data (say from a book or tweets), and then never trained it again so the bot would retain the structure of the original dataset and not degrade into one word responses, but still would learn vocabulary from discord.
The bot is not responding. However, the bot did respond when I placed your trump weights into the weights folder. Since I removed your trump weights, nothing seems to store in folder. I haven't changed anything within the bot besides adding the correct discord credentials and making changing "armchair_expert.py" into "arm.py"
I wouldn't worry about that exception for now (will take a look to see why its not shutting down cleanly though)
If you are missing the structure.h5 weights file, you should start the bot with --retrain-structure. For the bot to respond it needs both the structure.h5 and markov.json.zlib files in the weights folder. So do you have the markov.json.zlib file there?
You could also run with the --retrain-markov flag which will force retraining the markov chain which should output it to the weights folder relative to your working directory (which should be the root of the git repository)
Yeah, when running the bot or restarting the bot, both structure.h5 and markov.json.zlib are not showing in the weights folder, or anywhere. The only time I had structure-model.h5 and markov.json.zlib in the weights folder is from when I used the Trump tweets dropbox link you provided in an earlier ticket.
Should I be using those trump tweets structure-model.h5 and markov.json.zlib files in my main weights folder all the time? When I have those two in my weights folder the bot does respond (only to trump related statements though, understandably lol).
Can you paste the full output of the bot starting up (where I can see the command and current working directory as well)
Sure thing!
Ok, that's looks good. Can you do the same thing but with --retrain-markov argument?
I am looking to see if there is any data being stored in the discord training database.
I also included a side shot of my weights folder
Ok, there looks like there may be some sort of bug where either Discord messages are not being saved to the training database, or the training procedure is not training with them for whatever reason.
If you are familiar with SQLite, you could open up the discord.db file and take a look at the discordmessage table and see if there is anything there.
I should have time to take a look at it tomorrow, but I suspect maybe some upstream change in the discord.py or sqlalchemy libraries has somehow broken how this worked (or I broke it, because its been a long time since I used Discord as a learning data source TBH)
I was figuring it might have been some upstream or storing issue, because when looking at the databases last night, I noticed nothing was being populated - but I know just very basic SQL and wasn't sure on which end the issue was on lol.
I went ahead and took another look at the discord.db and specifically the discordmessage table: results are empty.
I do appreciate you helping me through this today and taking time tomorrow to take a closer look!
I can confirm I see the behavior on my install as well. Debugging now.
Ok, that was embarrassing. Somewhere in refactoring a while back I forgot to commit the messages to the database after inserting to them!
Should be fixed in 38734163acbf4f4e2bef99df741e77d2b90c7454, let me know if it works for you!
Haha great! I will go ahead and give it a shot. Should I run it without any flags?
No flags needed, but you will need to collect data for a while if you are only going to use discord as a source (joining it to a bunch of servers works well)
If you want to speed things along, using a script like https://github.com/csvance/armchair-expert/blob/master/scripts/import_text_file.py will work wonders. Just feed it a newline separated text file, such as a book.
Went ahead and ran it without any flags, sent it some messages, check the discord.db and I do now see it populating! Went ahead and ran restart of the bot with the --retrain-markov flag and I do now see the weights folder being populated as well! Thank you so much!
I will also attempt to use the import_text_file.py again - I had some issues with it last night but I think that may have been just (L)user error lol. When I want to import a text file using "import_text_file.py"do I place the txt file in the scripts directory or under a different directory? (Have any recommendations of a file I should use?)
I need to improve the documentation / fix the script to run in the scripts directory.
Copy the python script to the git root and then invoke it, ie
cd armchair-expert
cp scripts/import_text_file.py .
python import_text_file.py "Principia Discordia.txt"
Heres some example output when trained with that:
Went ahead and tried to run imported_text_file.py. It returned that I was missing the "storage" module so I went ahead a pip'd that. Ran again and said I was missing "storage.imported" module, so I went to pip that and it error'd out.
You have to copy it to the base directory and run it from there (stupid I know, I need to fix it) See the above commands I posted.
Oh yup! I was literally just about to edit that saying this was user error! I forgot to move it down to the git root. That seemed to make it work!
Honestly there's quite a few things I need to fix and document to make the bot be more user friendly, so I wouldn't really call it user error.
I am going to get back to studying for my Computer Organization & Architecture final now, hopefully you should be on your way to getting some initial results! I should be able to check up on this again tomorrow.
So far so good! Thanks again for taking the time to debug this today. I know how finals go... It's always bittersweet around this time of the year lol. Good luck in your studying!
Hey, just a quick update on some improvements I made today.
You will need to add this to your config/ml.py:
# Lower values make things more predictable, higher ones more random
STRUCTURE_MODEL_TEMPERATURE = 0.7
MARKOV_MODEL_TEMPERATURE = 0.7
I have gotten good results with 0.7 and 1.0, but you can experiment with other values if you want.
Awesome! I went ahead and made the same changes last night so I'll be interested in seeing how it does now
When I start the bot I get it to the RUNNING state, however, I don't believe it is storing anything from my discord channel.
I noticed that when I check the weights directory it is empty. Whenever I place the test trump data that you provided in the weights folder, I get responses from the bot in discord. Should the python program be creating the markov.json.zlib and structure-model.h5 in the weights directory on start-up?
Images for reference:
Thanks!