ewuin / stockCrawl

sentiment analysis of financial news articles from web scrapers
https://18.188.177.97
2 stars 2 forks source link

How to generate pickledModel_MNB.pkl file #1

Open sbnair opened 6 years ago

sbnair commented 6 years ago

Hi

I was checking your code related to stock crawl, i was seeing how to re generate this pickledModel file like updating it ?

Thanks, Shailesh Nair.

ewuin commented 6 years ago

Hello Shailesh,

If you look in the file /stockBot/data_gathering/sentiment_analysis.py, at the bottom there are these two sets of lines (one for SVC model pickle, other for MNB model pickle) with open('pickledModel_SVC_three_cat.pkl','wb') as fout: pickle.dump((vectorizer,model1),fout) These store the data necessary that can be reused to analyze other texts. If you want to switch to a two (or other number) category system, you have to remake the tables (csv files) produced by the article_cleaner.py lines 57-63. then Match the numpy arrays in sentiment analysys to n x n array. Thank you for taking a look at my project. Let me know if you have any other questions. You are welcome to suggest improvements or other ideas. Or perhaps we can work together and make it much better? May I ask where you are writing from? I am in the United States. Also, I have one version using scrapy web crawlers, and another better one written with beautifulsoup4 web crawlers; it is in the stockSoup repo. You can see this app in action at the ip address 18.188.177.97 I must say that if you are pulling text from the internet, beautifulsoup4 does a better job of isolating the html elements with the relevant text. Regards, Ewuin

On Mon, Jun 11, 2018 at 12:33 PM, sbnair notifications@github.com wrote:

Hi

I was checking your code related to stock crawl, i was seeing how to re generate this pickledModel file like updating it ?

Thanks, Shailesh Nair.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ewuin/stockCrawl/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/APlgKu9RibSx9x_fJNOCYvtm72zdZMhcks5t7pvPgaJpZM4UjBA_ .