ghostfolio / ghostfolio

Open Source Wealth Management Software. Angular + NestJS + Prisma + Nx + TypeScript 🤍
https://Ghostfol.io
GNU Affero General Public License v3.0
3.85k stars 362 forks source link

Optimize data gathering #569

Closed fly-man- closed 2 years ago

fly-man- commented 2 years ago

Each time when the data-gathering-7d runs it's requesting data that's already in the database

Perhaps good to have some kind of check in place if the data has already been grabbed to keep the API from being overcalled and thus restricting the download of data.

With 400 stocks, in 30 mins Ghostfolio has told me that the data couldn't be grabbed because of API overusage

dtslvr commented 2 years ago

Thank you for pointing to this problem @fly-man-.

At the moment, the service loops over all symbols and makes one API call for every symbol! Do you have an idea how to check the symbols which are already already available and skip them?

Talking about performance, another optimization could be to store market data in batches rather than item per item.

fly-man- commented 2 years ago

Well, I dont know much about how the backend part works. I'd think that before the gatherStocks is run, a check could take place against the database to see if it's got recent data. My idea would be to have a lastCheck field which if it's older then 12 hours rerun the gatherStocks so it's up to date.

Perhaps a better way to store market data is to download the full set first and then hand it off to a file reader that reads it into the database itself, that way the main thread can keep grabbing the data while the file importer thread works simultaneously

dtslvr commented 2 years ago

It's currently checking every hour. I think the chance is then higher to have high quality data. Otherwise, if data gathering fails, it would wait another 11 hours.

I have now added a simple check by counting the historical data to determine which symbols need to be gathered (#576).

I'm very interested to know if this optimization is sufficient for your setup with 400+ stocks.

fly-man- commented 2 years ago

It def. looks promising, the only errors I still get are these for the currency exchange

ghostfolio_1 | [Nest] 1 - 12/25/2021, 4:15:00 PM LOG 7d data gathering has been started. postgres_1 | 2021-12-25 16:15:00.202 UTC [34] ERROR: duplicate key value violates unique constraint "MarketData_date_symbol_key" postgres_1 | 2021-12-25 16:15:00.202 UTC [34] DETAIL: Key (date, symbol)=(2021-12-20 00:00:00, USDEUR) already exists. postgres_1 | 2021-12-25 16:15:00.202 UTC [34] STATEMENT: INSERT INTO "public"."MarketData" ("createdAt","dataSource","date","id","symbol","marketPrice") VALU ES ($1,$2,$3,$4,$5,$6) RETURNING "public"."MarketData"."date", "public"."MarketData"."symbol" postgres_1 | 2021-12-25 16:15:00.224 UTC [32] ERROR: duplicate key value violates unique constraint "MarketData_date_symbol_key" postgres_1 | 2021-12-25 16:15:00.224 UTC [32] DETAIL: Key (date, symbol)=(2021-12-21 00:00:00, USDEUR) already exists. postgres_1 | 2021-12-25 16:15:00.224 UTC [32] STATEMENT: INSERT INTO "public"."MarketData" ("createdAt","dataSource","date","id","symbol","marketPrice") VALU ES ($1,$2,$3,$4,$5,$6) RETURNING "public"."MarketData"."date", "public"."MarketData"."symbol" postgres_1 | 2021-12-25 16:15:00.238 UTC [34] ERROR: duplicate key value violates unique constraint "MarketData_date_symbol_key" postgres_1 | 2021-12-25 16:15:00.238 UTC [34] DETAIL: Key (date, symbol)=(2021-12-22 00:00:00, USDEUR) already exists. postgres_1 | 2021-12-25 16:15:00.238 UTC [34] STATEMENT: INSERT INTO "public"."MarketData" ("createdAt","dataSource","date","id","symbol","marketPrice") VALU ES ($1,$2,$3,$4,$5,$6) RETURNING "public"."MarketData"."date", "public"."MarketData"."symbol" postgres_1 | 2021-12-25 16:15:00.252 UTC [32] ERROR: duplicate key value violates unique constraint "MarketData_date_symbol_key" postgres_1 | 2021-12-25 16:15:00.252 UTC [32] DETAIL: Key (date, symbol)=(2021-12-23 00:00:00, USDEUR) already exists. postgres_1 | 2021-12-25 16:15:00.252 UTC [32] STATEMENT: INSERT INTO "public"."MarketData" ("createdAt","dataSource","date","id","symbol","marketPrice") VALU ES ($1,$2,$3,$4,$5,$6) RETURNING "public"."MarketData"."date", "public"."MarketData"."symbol" postgres_1 | 2021-12-25 16:15:00.267 UTC [34] ERROR: duplicate key value violates unique constraint "MarketData_date_symbol_key" postgres_1 | 2021-12-25 16:15:00.267 UTC [34] DETAIL: Key (date, symbol)=(2021-12-24 00:00:00, USDEUR) already exists. postgres_1 | 2021-12-25 16:15:00.267 UTC [34] STATEMENT: INSERT INTO "public"."MarketData" ("createdAt","dataSource","date","id","symbol","marketPrice") VALU ES ($1,$2,$3,$4,$5,$6) RETURNING "public"."MarketData"."date", "public"."MarketData"."symbol" ghostfolio_1 | [Nest] 1 - 12/25/2021, 4:15:00 PM LOG 7d data gathering has been completed.

dtslvr commented 2 years ago

It def. looks promising, the only errors I still get are these for the currency exchange

Thanks a lot for your feedback! I have realized that I forgot to filter the existing currencies, I have added it now in #581.

fly-man- commented 2 years ago

1.95.0 does indeed show me no errors anymore when performing the 7d data gathering

fly-man- commented 2 years ago

When I earlier opened up my Ghostfolio it showed me this in the log file, the information hasn't updated since the moment I added it to the portfolio

ghostfolio_1  | data-gathering-7d: 54.623ms
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:14 PM    WARN Missing value for symbol CIM at 2021-12-29
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:14 PM    WARN Missing value for symbol CIM at 2021-12-29
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:14 PM    WARN Missing initial value for symbol CIM at 2021-12-28
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:20 PM    WARN Missing value for symbol CIM at 2021-12-29
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:20 PM    WARN Missing value for symbol CIM at 2021-12-29
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:20 PM    WARN Missing initial value for symbol CIM at 2021-12-28
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:25 PM    WARN Missing value for symbol CIM at 2021-12-29
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:25 PM    WARN Missing value for symbol CIM at 2021-12-29
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:25 PM    WARN Missing initial value for symbol CIM at 2021-12-28
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:25 PM   ERROR [ExceptionsHandler] Cannot read property 'toNumber' of null
ghostfolio_1  | TypeError: Cannot read property 'toNumber' of null
ghostfolio_1  |     at Zr.<anonymous> (/ghostfolio/apps/api/main.js:1:89626)
ghostfolio_1  |     at Generator.next (<anonymous>)
ghostfolio_1  |     at fulfilled (/ghostfolio/apps/api/node_modules/tslib/tslib.js:114:62)
ghostfolio_1  |     at runNextTicks (internal/process/task_queues.js:60:5)
ghostfolio_1  |     at processImmediate (internal/timers.js:437:9)
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:31 PM    WARN Missing value for symbol CIM at 2021-12-29
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:31 PM    WARN Missing value for symbol CIM at 2021-12-29
ghostfolio_1  | [Nest] 1  - 12/31/2021, 3:00:31 PM    WARN Missing initial value for symbol CIM at 2021-12-28

Is this a BUG or is it just not getting any of the values for that stock ?

dtslvr commented 2 years ago

Is this a BUG or is it just not getting any of the values for that stock ?

It happens that historical data cannot be gathered completely, usually it recovers when the data is finally available from the data source.

I will improve the exception handling, it should not throw an error. Thanks for reporting.

dtslvr commented 2 years ago

The experience with the optimized data gathering is quite positive for the last couple of days. I will close this issue now.