caiobran / mstables

MorningStar.com scraper that consolidates tens of thousands of financial records into a SQLite relational database. Class 'dataframes' easily converts the SQLite data into pandas DataFrames (see Jupyter notebook for examples)
MIT License
180 stars 46 forks source link

Fails to run on Ubuntu 16 and OS X #1

Open trevorwelch opened 5 years ago

trevorwelch commented 5 years ago

Thanks for your hard work on this project!

Currently, I'm unable to get it running, I've tried on both Ubuntu 16.04 and my local machine.

On both Ubuntu 16.04 and OS X:

=================================================================
====================== Welcome to msTables ======================

Available actions:

0 - Change database file name (current name = 'mstables.sqlite')
1 - Create database tables and import latest symbols
2 - Download Morningstar data into database
3 - Erase all records from database tables
4 - Delete all database tables
5 - Erase all downloaded history from 'Fetched_urls' table
6 - Create a database back-up file

=================================================================
Enter action no.:
1

Please wait, database tables are being created ...

Traceback (most recent call last):
  File "main.py", line 161, in <module>
    main(db_file)
  File "main.py", line 100, in main
    msg = fetch.create_tables(db_file['path'])
  File "/Users/tw/Github/msTables/fetch.py", line 64, in create_tables
    conn = sqlite3.connect(db_file)
sqlite3.OperationalError: unable to open database file

Confirmed that sqlite and all other requirements are installed.

trevorwelch commented 5 years ago

I got a bit further by just creating the default mstables db as apparently specified in main.py:

mkdir db
cd db
python3
import sqlite3
conn = sqlite3.connect('mstables.db')

Then, when running main.py and selecting 1 this succeeds!

However, running main.py again and selecting 2 fails on my Ubuntu 16 virtual machine with:

Enter action no.:
2

Qty. of records to be updated:
100

\e[KCreating URL list for API 6 ...
\e[KCreating URL list for API 8 ...
\e[KCreating URL list for API 5 ...
\e[KCreating URL list for API 15 ...
\e[KCreating URL list for API 3 ...
\e[KCreating URL list for API 13 ...
\e[KCreating URL list for API 1 ...
\e[KCreating URL list for API 7 ...
\e[KCreating URL list for API 2 ...
\e[KCreating URL list for API 11 ...
Killed

I'm assuming this is because my VM is too weak, although changing the pool_size parameter in fetch.py did not seem to make a difference.

Appears to be running locally though, so will stick to that for now 👍

trevorwelch commented 5 years ago

Running locally, it ran for a while:

\e[KFetching API 1...  15,450 /  88,701  (17.42%)
\e[KFetching API 1...  15,451 /  88,701  (17.42%)
\e[KFetching API 1...  15,452 /  88,701  (17.42%)
\e[K
 - Success rate:        1 out of 2,400 (0.0%)
\e[KStoring source data into database table 'Fetched_urls'...
\e[K - Fetch Duration:  9006.45 sec

\e[KPlease wait while the database is being queried ...
\e[KParsing results into database...         1 /      1 (100.0% )
Data = gangnam style! 14

Traceback (most recent call last):
  File "main.py", line 161, in <module>
    main(db_file)
  File "main.py", line 104, in main
    start = fetch.fetch(db_file['path'])
  File "/Users/tw/Github/msTables/fetch.py", line 366, in fetch
    parse.parse(db_file)
  File "/Users/tw/Github/msTables/parse.py", line 41, in parse
    parsing(conn, cur, fetched)
  File "/Users/tw/Github/msTables/parse.py", line 95, in parsing
    code = parse_1(cur, source_text, api)
  File "/Users/tw/Github/msTables/parse.py", line 161, in parse_1
    js = json.loads(data)
  File "/anaconda3/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/anaconda3/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/anaconda3/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

And ended with that error

thumble commented 4 years ago

I believe the above error is due to the fact that the Morningstar urls (sitemaps) are no longer valid.

sven128 commented 4 years ago

I just learned about this module today and I actually really like the idea of it. I checked the morning star website and couldn't find any information about API anymore. Am I right in assuming that the API is not accessible (or at least not for free anymore)?

datatalking commented 3 years ago

The script runs just fine in my terminal on macOS Catalina 10.15.7.

The morning star API is available for us industry professionals at a premium but the new version (whatever it does) only in beta at this point. I just found this repo yesterday and there is potential. Can we update the main.py so it creates the database? Then it would seem that the other issues might be better served as separate issues.

datatalking commented 3 years ago

@sven128 @trevorwelch this repo might be Archived by @caiobran.

If we can get in touch and @caiobran could make me a contributor I could update the code with a PR to potentially address these issues. I do hope we can continue this here, msTables is such a well written chunk of code.

Else we could fork this from my repo @datatalking

@sven128 the Morningstar API is essentially deprecated, we could set this to use a scraper, finance or IEX data.

@trevorwelch we could look at creating a PR to add your Aug 6, 2019 notes. Thanks for posting a fix!!

datatalking commented 3 years ago

Ok so I had some time today and got this working to integrate a few of the features we wanted but I've got a loop that I can't see and my eyes are tired.

This expands the menu for additional data sources like IEX which does have intra day prices. Adding the IEX is doable as I already have it working on a different script.

I also added in a feature to create the db if it wasn't already there.

Lastly I added more description files for parameters which is more for my needs to follow between functions but its technically pythonic. Anyways, I got this working but its in a loop with the bottom Goodbye that I added main so Goodbye_main keeps being thrown.

`

!/usr/bin/env python

from shutil import copyfile from datetime import datetime from importlib import reload import fetch, time, os, re, sqlite3

author = "Caio Brandao" copyright = "Copyright 2019+, Caio Brandao" license = "MIT" version = "0.0" maintainer = "Caio Brandao" email = "caiobran88@gmail.com"

Create database upon install of msTables

def create_db(): """ :return: """ import sqlite3 conn = sqlite3.connect('mstables.db')

TODO enhancement allow user to customize location of sqlitedbase, ports etc.

Create back-up file under /db/backup

def backup_db(file): """ Menu option 6 for mstables :param file: :return: """

today = datetime.today().strftime('%Y%m%d%H')

new_file = db_file['db_backup'].format(
    input('Enter back-up file name:\n'))
fetch.print_('Please wait while the database file is backed-up ...')
copyfile(db_file['path'], new_file)
return '\n~ Back-up file saved\t{}'.format(new_file)

def create_schedule(): """ Menu option 7 for mstables Run Schedule_Problem_Test2.ipynb from sbox/test :return: """ print("Planned function to automate trading schedule")

pass

    # schedule_path = '/Users/vanessawilson/sbox/test/Task_Scheduler_Problem/'
    # file_name = 'Schedule_Problem_Test2.ipynb'
    # print('create_schedule function created but empty')

def customizable_user_function(): """ Menu option 8 for mstables A function of your choice added here :return: """ return '\n~ You would need to create the function in fetch.administrative_options.'

def administrative_options(): """ Menu option 10 for mstables A function if you want to provide access to others for this and retain admin priveledge :return: """ return '\n~ You would need to create the function here.'

Change variable for .sqlite file name based on user input

def change_name(old_name): """ Menu option 0 for mstables :param old_name: :return: """ msg = 'Existing database files in directory \'db/\': {}\n' msg += 'Enter new name for .sqlite file (current = \'{}\'):\n' fname = lambda x: re.sub('.sqlite', '', x) files = [fname(f) for f in os.listdir('db/') if '.sqlite' in f] return input(msg.format(files, old_name))

Print options menu

def print_menu(names): """ :param names: :return: """ gap = 22 dash = '=' banner = ' Welcome to msTables ' file = '\'{}.sqlite\''.format(db_file['name']) menu = { ' 0' : 'Change database file name: (current name = {})'.format(file), ' 1' : 'Create database tables and import latest symbols:', ' 2' : 'Download Morningstar data into database:', # TODO could change to Yfinance, ' 3' : 'Erase all records from database tables:', ' 4' : 'Delete all database tables:', ' 5' : 'Erase all downloaded history from \'Fetched_urls\' table:',

'X' : 'Parse (FOR TESTING PURPOSES)',

    ' 6' : 'Create a database back-up file:',
    ' 7' : 'Create a schedule:',
    ' 8' : 'Customizable User Function:',
    ' 9' : 'Exit:',
    '10': 'Administrative Options:'
}

print(dash * (len(banner) + gap * 2))
print('{}{}{}'.format(dash * gap, banner, dash * gap))
print('\nAvailable actions:\n')
for k, v in menu.items():
    print(k, '-', v)
print('\n' + dash * (len(banner) + gap * 2))

return menu

Print command line menu for user input

def main(file): """ :param file: :return: """ db_path = 'db/backup/' create_db() if not os.path.exists(db_path): os.mkdir(db_path) print("Directory ", db_path, " Created ") os.chdir(db_path) else: print("Directory ", db_path, " already exists") while True:

    # Print menu and capture user selection
    ops = print_menu(file)
    while True:
        try:
            inp0 = input('Enter action no.:\n').strip()
            break
        except KeyboardInterrupt:
            print('\nGoodbye_ki!')
            exit()
    if inp0 not in ops.keys(): break
    reload(fetch) #Comment out after development
    start = time.time()
    inp = int(inp0)
    ans = 'y'

    # Ask user to confirm selection if input > 2
    if inp > 2:
        msg = '\nAre you sure you would like to {}? (Y/n):\n'
        ans = input(msg.format(ops[inp0].upper())).lower()

    # Call function according to user input
    if ans == 'y':
        print()
        try:
            # Change db file name
            if inp == 0:
                db_file['name'] = change_name(db_file['name'])
                start = time.time()
                db_file['path'] = db_file['npath'].format(db_file['name'])
                msg = ('~ Database file \'{}\' selected'
                    .format(db_file['name']))

            # Create database tables
            elif inp == 1:
                msg = fetch.create_tables(db_file['path'])

            # Download data from urls listed in api.json
            elif inp == 2:
                start = fetch.fetch(db_file['path'])
                msg = '\n~ Database updated successfully'

            # Erase records from all tables
            elif inp == 3:
                msg = fetch.erase_tables(db_file['path'])

            # Delete all tables
            elif inp == 4:
                msg = fetch.delete_tables(db_file['path'])

            # Delete Fetched_urls table records
            elif inp == 5:
                msg = fetch.del_fetch_history(db_file['path'])

            elif inp == 6:
                msg = backup_db(db_file['path'])

            elif inp == 7:
                msg = create_schedule()

            elif inp == 8:
                msg = customizable_user_function()

            elif inp == 9:
                pass
                # msg = backup_db(db_file)
                # msg = exit()

            # Back-up database file
            elif inp == int(list(ops.keys())[-1]):
                msg = backup_db(db_file)

                # TESTING
            elif inp == 99:
                fetch.parse.parse(db_file['path'])
                msg = 'FINISHED'
        # except sqlite3.OperationalError as S:
        #     msg = '### Error message - {}'.format(S) + \
        #         '\n### Scroll up for more details. If table does not ' + \
        #         'exist, make sure to execute action 1 before choosing' + \
        #         ' other actions.'
        #     pass
        # except KeyboardInterrupt:
        #     print('\nGoodbye!')
        #     exit()
        except Exception as e:
            print('\a')
            #print('\n\n### Error @ main.py:\n {}\n'.format(e))
            raise

        # Print output message
        #os.system('clear')
        print(msg)

        # Calculate and print execution time
        end = time.time()
        print('\n~ Execution Time\t{:.2f} sec\n'.format(end - start))
    else:
        os.system('clear')

Define database (db) file and menu text variables

db_file = dict() db_file['npath'] = 'db/{}.sqlite' db_file['name'] = 'mstables' db_file['path'] = db_file['npath'].format(db_file['name']) db_file['db_backup'] = 'db/backup/{}.sqlite'

if name == 'main': os.system('clear')

Create target Directory if don't exist

main(db_file)
print('Goodbye_main!\n\n')

`