franklingu / comp-match

Company names matching: match company names to legal names and stock symbols
https://comp-match.readthedocs.io/en/latest/
MIT License
16 stars 5 forks source link

import broken #12

Open realimpat opened 4 years ago

realimpat commented 4 years ago

Hi,

I'm unable to import this package to try it out. Reproduce steps:

  1. pip install comp-match

  2. python -c 'import comp_match'

Result: ModuleNotFoundErrror: No module named comp_match

franklingu commented 4 years ago

let me double check and get back to you. thanks for reporting

realimpat commented 4 years ago

import after running setup.py install does work if i clone the repo.

Then I tried the example in the readme, but it hung for 5 min then returned error: line 31 in merge_match_results score = weight * float(extra.get('score', 1)) ValueError: could not convert string to float

franklingu commented 4 years ago

I tried pip install. did not work for me either.

Install for local:

pip install -r requirements.txt
pip install -e .

Seems Thomson Reuters changed their stuff. I need to fix this. For now can try to remove this source

# google_finance, yahoo_finance can still work
>> comp_match.match(['Apple', 'Google', 'Facebook', 'CitiBank'], ['google_finance'])
{'Facebook': [['Facebook, Inc. Common Stock', CompanyUnderline: [FB2A@XETRA@DE], {'score': 2.0}], ['Facebook, Inc. Common Stock', CompanyUnderline: [FB@NASDAQ OMX PHLX@US], {'score': 1.0}], ['FACEBOOKI8JS/UnSBDR QI', CompanyUnderline: [FBOK34@Bovespa Stock Exchange@BR], {'score': 1.0}], ['Facebook, Inc. Common Stock', CompanyUnderline: [FB@@UN], {'score': 1.0}], ['Facebook, Inc. Common Stock', CompanyUnderline: [FB@Vienna Stock Exchange@AT], {'score': 1.0}]], 'Apple': [['Apple Inc.', CompanyUnderline: [APC@XETRA@DE], {'score': 2.0}], ['Apple Inc.', CompanyUnderline: [AAPL@NASDAQ OMX PHLX@US], {'score': 1.0}], ['Apple Hospitality REIT Inc', CompanyUnderline: [APLE@New York Stock Exchange@US], {'score': 1.0}], ['APPLE RUSH CO I/SH NEW', CompanyUnderline: [APRU@@UN], {'score': 1.0}], ['Apple Inc.', CompanyUnderline: [AAPL@@UN], {'score': 1.0}]], 'Google': [['CBOE Equity VIX ON Google', CompanyUnderline: [VXGOG@@UN], {'score': 1.0}]], 'CitiBank': []}
franklingu commented 4 years ago

And I will welcome contribution to this package :)

To be honest I did not take care of it ever since I had the first implementation and after 0.0.0.dev and the development effort for this just stopped as I do not work in this industry anymore. Any help like code, feature request or morale support would be good

realimpat commented 4 years ago

Hey Franklin,

I've taken a closer look at the code this week and I do have an interest in this. I am open to contributing/forking as I work on scratching my own itch.

I noticed that you have a lot of time.sleep statements, even going as long as sleeping for 30. I imagine this is the main cause of the slowness?

Before I try playing with that, do you remember what prompted such long sleeps? And/or are there any other 'institutional memory' notes/tricks you would advise me to keep in mind?

I'm happy to switch to email or something if 'issues' isn't the right space for this

franklingu commented 4 years ago

I am actually going to rewrite this package myself as well -- has been doing other stuff and did not do any maintenance -- I am planning to make the core async based.

the main reason for "slowness" is indeed caused by sleep if there is exception happening. sleep is intentional because when crawling websites like google or yahoo or tr you do not want to hit them too frequently.

franklingu commented 4 years ago

After my initial check, tr(thomson Reuters)changed their format and caused the parsing to throw exception. And retry triggered and it is going to wait longer and longer. Currently google and Yahoo finance works as usual

But a good point mentioned by you is that it is currently waiting too long before telling users that here is a problem.