Open apakrash opened 5 years ago
I'm pretty certain sentdex trained a model on around 50,000,000 pairs. One month of data is definitely not great, although there is some coherence it can be better.
do you recall how many months was that?
do you recall how many months was that?
I got 54,000,000 pairs with 9 months of comment data from Reddit. The monthly files of recent years (2017 & 2018) are quite dense, thus downloading 7-9 files should get you a decent amount of pairs.
Although, it all depends on how strict your filter is as well. I removed all comments with any links, and comments were filtered down to the ones that were no more than 500 characters.
thanks
do you recall how many months was that?
I got 54,000,000 pairs with 9 months of comment data from Reddit. The monthly files of recent years (2017 & 2018) are quite dense, thus downloading 7-9 files should get you a decent amount of pairs.
Although, it all depends on how strict your filter is as well. I removed all comments with any links, and comments were filtered down to the ones that were no more than 500 characters.
Can you provide with your filter which can probably save a lot of time for others to write their own?
I tried creating using 2015-05 conversation for training the bot. The answers were less than satisfactory. How many months of data was used for Charles v1/v2?