Ivareh / pathofmodifiersapp

Website application for checking prices on explicit and affixes specific items in Path of Exile
Other
2 stars 0 forks source link

Optimize data retrieval #269

Open bogadisa opened 1 month ago

bogadisa commented 1 month ago

Ideas:

bogadisa commented 1 month ago

Multi threading: We have 4 cores to work with, we want to dedicate two of them to API and Frontend. That leaves 2 for optimizing data retrieval.

How do we split the work? The main limitation of keeping up is the retireval of data from the API data stream. Therefore we want this to run continually. We do not want to interupt this thread often. We need a balance between the amount of data we proccess and how often we interupt the data retrieval. Interupt too often and we will be limited by the numer of requests we make and the overhead of managing two threads wont be worth it, interupt not often enough and the large data amounts will be too much.

bogadisa commented 1 month ago

Current structure: Currently we can divide the data retrieval process into three stages, spread over 2 main scripts (not counting the script that binds these together). We have poe_api.py which is in charge of retrieving data, converting to dataframes, and filtering until it collects a specified amount of items. And we have transform_poe_api_data.py which takes the output of poe_api.py and proccesses the data further.

The problem: We want to continue retrieving data while transform_poe_api_data.py is working. This may lead to one of two issues:

  1. We cannot retrieve new data while transforming
  2. We continue retrieving new data while transforming, leading to a big pile up

How big of a problem is this? If (1) happens, it will lead to a serious bottleneck and the benefit of multithreading drops even further. However, if (2) is true, then it wont necessarily be a big problem and also a benefit, as it gives the main thread (which is not in chrge of retrieving new data) something to do as soon as we return to poe_api.py

bogadisa commented 1 month ago

A thrid benefit is that we only really need to handle threading inside poe_api.py. Which means we can initialize the second thread during initializating the module and manage joining of the threads inside the module.

bogadisa commented 4 weeks ago

Try using two listeners.

Ping pong the next change id between them. Send a get request, only load the headers and send the next change id to the other thread which repeats the same pattern. This means you are constantly loading in two requests. No longer needs async? maybe because we still need to have controll over what is loaded when.