caiobran / mstables

MorningStar.com scraper that consolidates tens of thousands of financial records into a SQLite relational database. Class 'dataframes' easily converts the SQLite data into pandas DataFrames (see Jupyter notebook for examples)
MIT License
177 stars 45 forks source link

API issues #2

Open chasealanbrown opened 4 years ago

chasealanbrown commented 4 years ago

After realizing that a simple mkdir db command is required (This is a simple fix that can be added to the package), there is some trouble regarding the API.

The input/api.json file shows ..."morningstar.com/api/v2/search/"... being used for the REST API, which doesn't appear to be functional anymore.

Here is the output when running now:

=================================================================
====================== Welcome to msTables ======================

Available actions:

0 - Change database file name (current name = 'mstables.sqlite')
1 - Create database tables and import latest symbols
2 - Download Morningstar data into database
3 - Erase all records from database tables
4 - Delete all database tables
5 - Erase all downloaded history from 'Fetched_urls' table
6 - Create a database back-up file

=================================================================
Enter action no.:
2

Qty. of records to be updated:
1000000

Qty. of records pending update per API no.:

      Pending
API          
1      88,701
2      88,701
3      88,701
4     138,312
5     138,312
6     138,312
7     138,312
8     138,312
9      34,032
10    138,312
11    138,312
12    138,312
13    138,312
14    138,312
15    138,312
16    138,312

Total URL requests pending =    1,959,879
Total URL requests planned =    1,959,879

Run 1 / 817 (150 requests per API per run = 2400 requests per run)
 - Success rate:    0 out of 2,400 (0.0%)
 - Fetch Duration:  48.57 sec
 - Total Duration:  48.59 sec
 - Speed:       0.00 records/sec

Run 2 / 817
 - Success rate:    0 out of 2,400 (0.0%)
 - Fetch Duration:  41.45 sec
 - Total Duration:  41.47 sec
 - Speed:       0.00 records/sec

Run 3 / 817
 - Success rate:    0 out of 2,400 (0.0%)
 - Fetch Duration:  41.82 sec
 - Total Duration:  41.84 sec
 - Speed:       0.00 records/sec

In order to fix this, it seems that a large re-write of the package would be neccessary. This would likely be fruitful, as it appears that authentication is required now for morningstar API access, and it would also make this code far more simple and readable using something such as https://github.com/aaaccell/morningstar in order to make the calls.

One simple and fast suggestion is to provide a link to the resulting sqlite3 file via dropbox or torrent at specified yearly / monthly intervals, in case some users are incapable of creating the database due to changes in the API or other issues.

joe-wojniak commented 3 years ago

Web scraping isn't necessary. Here's a repo that provides a locally saved dataset: https://github.com/joe-wojniak/PythonForFinance

datatalking commented 3 years ago

What if we split the difference and create a function that allows the user loads your data from the file function, then a separate function for scraping. I have one like this I use for yahoo finance I could supply.

datatalking commented 3 years ago

@joe-wojniak That is a great repo, I wonder why it was voted down... it seems to point to an empty folder inside of https://github.com/caiobran/mstables which is deprecated.

@chasealanbrown As well written as mstables is I would like to continue working on it as I like the format it is written in. @joe-wojniak if this repo is Archived would you consider forking this and adding me as a contributor so we can add these functions and other for future PR requests or future functionss

joe-wojniak commented 3 years ago

It doesn't appear to be empty when I go to the repo. I think you can fork a copy.

The source code is the *.py files.

-Joe W.

On Tue, Aug 17, 2021 at 5:25 PM Andrew Schell @.***> wrote:

@joe-wojniak https://github.com/joe-wojniak That is a great repo, I wonder why it was voted down... it seems to point to an empty folder inside of https://github.com/caiobran/mstables which is deprecated.

@chasealanbrown https://github.com/chasealanbrown As well written as mstables is I would like to continue working on it as I like the format it is written in. @joe-wojniak https://github.com/joe-wojniak if this repo is Archived would you consider forking this and adding me as a contributor so we can add these functions and other for future PR requests or future functionss

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/caiobran/mstables/issues/2#issuecomment-900696564, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAV4CAL64IFGYORG232ZMTT5LVWNANCNFSM4MT7V3RA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- -Joe Wojniak

CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure.

datatalking commented 3 years ago

@joe-wojniak Perhaps I misspoke.

In your Nov 10, 2020 reply you posted a URL https://github.com/joe-wojniak/PythonForFinance which (when I click it) actually opens https://github.com/caiobran/mstables/issues/url.

Is that a mistype in the HTML link?

markdown is a new language format for me so I didn't know if it was a trick or bug.

Screenshot Below. Screen Shot 2021-08-21 at 10 51 16 AM

joe-wojniak commented 3 years ago

ok- the url is redirecting to the wrong location- if you copy & paste this into a nav bar it will take you to the repo:

https://github.com/joe-wojniak/PythonForFinance

-Joe W.

On Sat, Aug 21, 2021 at 11:17 AM Andrew Schell @.***> wrote:

@joe-wojniak https://github.com/joe-wojniak Perhaps I misspoke.

In your Nov 10, 2020 reply you posted a URL https://github.com/joe-wojniak/PythonForFinance which (when I click it) actually opens https://github.com/caiobran/mstables/issues/url.

Is that a mistype in the HTML link?

markdown is a new language format for me so I didn't know if it was a trick or bug.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/caiobran/mstables/issues/2#issuecomment-903147432, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAV4CCSYRQ42OIYWJ6ZQA3T57NTPANCNFSM4MT7V3RA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- -Joe Wojniak

CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure.

joe-wojniak commented 3 years ago

Here's a github markdown cheat sheet- github markdown is a little different than standard markdown;

https://guides.github.com/pdfs/markdown-cheatsheet-online.pdf

-Joe W.

On Sat, Aug 21, 2021 at 12:34 PM Joe Wojniak @.***> wrote:

ok- the url is redirecting to the wrong location- if you copy & paste this into a nav bar it will take you to the repo:

https://github.com/joe-wojniak/PythonForFinance

-Joe W.

On Sat, Aug 21, 2021 at 11:17 AM Andrew Schell @.***> wrote:

@joe-wojniak https://github.com/joe-wojniak Perhaps I misspoke.

In your Nov 10, 2020 reply you posted a URL https://github.com/joe-wojniak/PythonForFinance which (when I click it) actually opens https://github.com/caiobran/mstables/issues/url.

Is that a mistype in the HTML link?

markdown is a new language format for me so I didn't know if it was a trick or bug.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/caiobran/mstables/issues/2#issuecomment-903147432, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAV4CCSYRQ42OIYWJ6ZQA3T57NTPANCNFSM4MT7V3RA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- -Joe Wojniak

CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure.

-- -Joe Wojniak

CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure.

caiobran commented 3 years ago

Hi all, my apologies for being MIA here. I haven't worked on this project in a couple of years but I am happy to help continue it if you guys are still interested.

datatalking commented 3 years ago

@caiobran I am interested and wanted to say again that your repo here is Awesome. The last few months data_overview.ipynb has fun to work with and inspiring me with other ideas.

In general, and in a sort of order I want to:

  1. Debug a few errors that data_overview.ipynb has been giving me.
  2. Document your current code a bit more so I can follow all 1500+ lines of code
  3. Figure out how much functionality we lost with the Morningstar API deprecated
  4. What functionality we can add, for example integrating the open-source backtrader or zipline products
  5. Expand on what you made to accept API, CSV, or scrape data feeds like IEX, Yahoo, Trading economics
  6. Add me as a contributor as I've been working on this all summer and learning so much.
  7. @joe-wojniak seems to have a solid understanding and although I've not reviewed all his code there seems to be some overlap that we can all benefit from.
  8. Add in historical trend regression stuff
  9. Lots of potential

Some of these functions are available elsewhere so we don't need to start from scratch, we could integrate pandas data frames, but your menu is such a simple and helpful tool I want to build on your work.

caiobran commented 3 years ago

Hi @datatalking, ill be honest, this was a project i worked on to learn python. There are lot of better ways to have built this. I am thinking it might be worth making it into a class object. I got a new job after that and we only use R so i kinda stopped developing in Python but would like to pick it back up again. I also have never collaborated with others on GitHub outside of work so I think it might be worth having a call to discuss a plan on how we can work together.

With all that said, i totally agree with your points and would be more then happy to add you as a collaborator. Let me know if you are open to having a chat. I am in San Diego so I'm in Pacific time.

joe-wojniak commented 3 years ago

If you're interested in learning web scraping, then scraping Morningstar is a good exercise because Morningstar has gone to lengths to prevent web scraping (so it's a challenge.)

I ended up using data available through the Pandas datareaders and stooq for free stock data. Once you have a strategy that you're interested in trading, then you get into OATH2 and accessing data through a broker API.

-Joe W.

On Sun, Aug 22, 2021 at 7:42 PM Caio @.***> wrote:

Hi @datatalking https://github.com/datatalking, ill be honest, this was a project i worked on to learn python. There are lot of better ways to have built this. I am thinking it might be worth making it into a class object. I got a new job after that and we only use R so i kinda stopped developing in Python but would like to pick it back up again. I also have never collaborated with others on GitHub outside of work so I think it might be worth having a call to discuss a plan on how we can work together.

With all that said, i totally agree with your points and would be more then happy to add you as a collaborator. Let me know if you are open to having a chat. I am in San Diego so I'm in Pacific time.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/caiobran/mstables/issues/2#issuecomment-903381280, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAV4CBWWP74O67R4V7TIRLT6GRQ7ANCNFSM4MT7V3RA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- -Joe Wojniak

CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure.

datatalking commented 3 years ago

Is morning star worth web scraping though? I've been helping and digging into issues for Pandas datareader and really like that setup. Would it makes sense to keep the scraping feature or change this to accept pandas? I'll dig into options, I need the widest net of detailed data, but intraday numbers is my current goal.

I really like how this is saved to a database, there is a similar database I'm building for practice in postgres for securities so there is lots to build.

I've just upgraded my main machine to a 12 core processor tonight with 64 gig of ram and a GPU that I'm integrating has about 2000 cores so i am excited to work on intensive processing.

Seattle here, usually I'm up at 5am for the market till around 10pm, would this Saturday or Friday work for you? Paintstone@gmail.com is a good email.

joe-wojniak commented 3 years ago

Most of the free sources I've been able to find are daily. Interday is a bit harder, you may have to set up a brokerage account that has an api. I like TDAmeritrade, but there are others. I'm in the middle of another project right now, sorry I won't be able to help. @Andrew that sounds like an awesome machine setup! Did you buy it from Dell?

On Tue, Aug 24, 2021 at 7:40 PM Andrew Schell @.***> wrote:

Is morning star worth web scraping though? I've been helping and digging into issues for Pandas datareader and really like that setup. Would it makes sense to keep the scraping feature or change this to accept pandas? I'll dig into options, I need the widest net of detailed data, but intraday numbers is my current goal.

I really like how this is saved to a database, there is a similar database I'm building for practice in postgres for securities so there is lots to build.

I've just upgraded my main machine to a 12 core processor tonight with 64 gig of ram and a GPU that I'm integrating has about 2000 cores so i am excited to work on intensive processing.

Seattle here, usually I'm up at 5am for the market till around 10pm, would this Saturday or Friday work for you? @.*** is a good email.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/caiobran/mstables/issues/2#issuecomment-905101283, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAV4CF6GP5B3YFORKRTUD3T6RCXLANCNFSM4MT7V3RA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- -Joe Wojniak

CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure.

datatalking commented 3 years ago

@chasealanbrown I solved (sort of) the api issue as we can now scrape a few of the Morningstar data points. I Will work with @caiobran to issue a PR soon. @joe-wojniak lets move the hardware thread to a separate issue.