cmallwitz / Financials-Extension

Extension for LibreOffice Calc to access stock market data
Other
137 stars 17 forks source link

Extension works, but experiencing performance issues #38

Closed vijaysaxena99 closed 2 years ago

vijaysaxena99 commented 2 years ago

After successfully installing the extension some time ago, I wrote a spreadsheet to track investments in multiple accounts and visualize in charts. This extension works as designed-- in terms of pulling data from Yahoo. The issue is performance: I have 50 or so ticker symbols and 5 or 6 data points for each. I have a worksheet ("Prices") to populate all data pulled from Yahoo. Other worksheets (one per account) look up data that they need from this one Prices worksheet. Since multiple accounts may own same stock, I thought it best to make the call to Yahoo only once and then all accounts use this. Everything works as designed. The issue is that anytime I do any edit in any spreadsheet tab-- in tabs for individual account tabs, not in the "Prices" tab (for example, change annotation in Pie charts), the spreadsheet updates the whole "Prices" worksheet. It takes more than 8 or 10 minutes! I sit and watch in log file individual calls to Yahoo go by in real time! For 50 symbols x 6 datapoint each=300 calls to Yahoo!

Perhaps a solution is to provide for specifying multiple datapoints for one ticker in one call and let the extension populate adjacent columns with all datapoints? Something like GETREALTIME(ticker, 104, 5, 6....., "YAHOO")?

cmallwitz commented 2 years ago

The extension will pull the data for all data points for a symbol in one step. subsequent requests for same or different data point will reuse the cached data for up to 60 seconds. So for n symbols you are looking at n network calls which could take up to a second each depending on your location and network speed.

Have a look at your home/user directory, directory .financials-extension, file trace.log - this has list of all calls to extension and the time it took to do them.

Note: please look out for failed/unknown symbols: they are not cached and if you pull multiple data points for an unknown symbol it will do a network call each time (I should probably change that...)

There is an file extension.log in same location that may have additional error information helping you to find occurrences of you pulling data for unknown symbols.

vijaysaxena99 commented 2 years ago

Thanks for explaining so quickly! How to make use of all the data cached from one call to Yahoo for a particular ticker symbol? Currently I have one ticker symbol per row with required data points in columns. Will LibreCalc execute all columns along a row first (so cached data for a ticker is used) before moving to the next row for the next symbol? Or does it work on a column first? In the latter case, obviously, cached data for one symbol will be immediately written over when call to the next symbol in the column is made...

cmallwitz commented 2 years ago

The extension has no control over the order of calls from Libre Calc. It will although, cache data for all the symbols requested at least once for any data point (for 60 seconds). So whether you have your logic left/right, top/bottom or in different tabs should not make any difference (at least I'm not aware of any reports)

vijaysaxena99 commented 2 years ago

So let me understand: cache in the extension is large enough to hold all Yahoo data for all 60-70 symbols at a time, so I can access it any order? That is good.

BTW looking at trace file I notice that calls to get data for stocks & ETF's take less time (1000-1500ms each) than calls for Mutual Funds (2000-3000ms). Have others seen it too?

Thirdly, is it possible to pull Expense ratios (Gross & Net) for Mutual Funds & ETF's? Also is DIV-YIELD (code 71) for trailing 12 months or forward-looking? (Trailing is preferable).

Thank you for your help; I do appreciate it!

cmallwitz commented 2 years ago

yes - it keeps all required data - well, the supported data points per symbol * number of symbols (60-70 in your case)

in my trace.log mutual funds don't seem to be noticeable slower than regular stock but there is a fair amount of variation and latency over the network or on the provider web site is totally unpredictable. For instance I use the Financial Times as source for most of my stuff because it is quicker/closer for me (from London)

Dividend yield by both Yahoo/FT is forward yield - Yahoo says it gets this data from Morningstar.

Net expense ratio for mutuals/ETFs seem to be available from Yahoo and FT so I could take a look and add it.

vijaysaxena99 commented 2 years ago

re Div Yield (as well as P/E ratio): my watchlist on Yahoo Finance website shows both trailing annual and forward annual values for stocks and funds. May be trailing values are not available to be pulled? (I prefer trailing values)

Thank you for looking into adding expense ratios. Could you also add Morningstar Ratings for funds & ETF's that are available in Yahoo?

vijaysaxena99 commented 2 years ago

In terms of using already-pulled data from cache (rather than re-visiting the website for each data point for a symbol): Looks like the cache can handle 28 symbols but not 96 symbols. In the attached trace file, we can see that at 12.43PM calls to website were made for 28 symbols when referenced the first time (for code 104, NAME). After that, for codes 21 & 11 (Price and Price change) it referred to the cache as seen in very fast response times. In total, to access 3 datapoints each for 28 symbols took about 1 minute.

However, at 17:36PM when I added more symbols (total of 96 now), the extension made 96 calls for the first datapoint (code 104) and then had to go to Yahoo website again for the next datapoint (code 21 & 11), for a total of 10 minutes. This is the performance issue I referred to in the original post of this thread.

How many symbols can the cache handle at a time? Can the cache be expanded to handle larger number of symbols?

Secondly, it looks like LOCalc executed down a column (that is why calls to all the symbols happen first for code 104, then for code 11 and finally for code 21). Do I need to rearrange symbols along columns (as opposed to along rows currently) in order to achieve shorter response time? Is there an option an Calc which will force it to update spreadsheet by rows rather than columns?

vijaysaxena99 commented 2 years ago

trace.log

cmallwitz commented 2 years ago

OK - I think I understand what you mean.

Note: most of my spreadsheets deal with up to 20-30 symbols.

I can confirm LC is calling the extension first downwards (rows) and then going to the right (next columns) where it goes down first again before moving to any further column. As I have said before, I have no control over this.

That means if you have 60 symbols downwards in column A and 60 calls to extension to get price in column B, it will start doing one after the other. If you have 60 calls to extension to get name in column C, they will processed after all prices are fetched.

The problem here is the speed of each symbol look. If it takes 1 seconds to get data per symbol, it will take 60 seconds to get all these prices. Because the cache is only used for up to 60 seconds this means if getting data takes 1.1 seconds on average it will not be used to get names and instead a fresh network call will be made.

I could address this by changing the time for when the cache is used to e.g. 10 minutes.

But in general you are correct, if the calls to Yahoo take 1.5-2.5 seconds each fetching all prices for the S&P 500 will take more than 15 minutes.

You could try whether FT is faster for you on average (it is for me because I sit in the UK) but in general the question is how long do you want to wait for the data to appear if you have a larger number of symbols.

vijaysaxena99 commented 2 years ago

yes, holding cached values for a longer period should do the trick. (If cache is large enough to hold all values for, say 100 symbols?). Thank you.

vijaysaxena99 commented 2 years ago

Actually, don't bother about holding cache for longer: I transposed my symbols from rows to columns. This way I make use of all cached values for a symbol in a column before moving on to calling data for the next symbol in the next column. (You may want to include this as a caution in README-- for newbies like me).

If you could please add Expense Ratios and Morningstar ratings of funds & ETF's that would be great! Also, trailing P/E and Dividend Yields if possible. All of these are available in Yahoo Finance website. Thank you.