Armory8854 / yapm

Yet Another (python) Podcast Manager
GNU General Public License v3.0
0 stars 0 forks source link

Concurrent Downloads - #8

Closed Armory8854 closed 1 year ago

Armory8854 commented 1 year ago

While researching ways to speed up the download process, I came to the realization that I could be running multiple instances of newPodcastDownload() at the same time. This would involve a few new additions and concepts:

  1. Threading
  2. Understanding gunicorn workers better
  3. Removing calls to the database during the actual download functions.

The hardest one for me to conceptualize here is number 3. My first thought is to query the database and store all of the key / values in a list of dictionaries, and then pass that to the download function. This way, we don't have to call the database every single time we want to run a download.

This does introduce another problem to me - how do I pass which podcasts were downloaded or not to the final database call? My first idea now, before doing research, is update the dictionary with a downloaded 0/1 key value pair. I would just update the dictionary as I download.

Final question: How do I ensure the downloads don't pull the same values? As in, if the process starts running 3 times, will it try to download podcast 1 3 times at once? If it sets the status to downloaded 0/1 as it's pulled, it should remedy this. Possibly even add a new database field called attempts that counts up every time a podcast tries to download. Some way to measure that we should skip the failed download this run, but save it for a run 2.

Armory8854 commented 1 year ago

DOWNLOADS should use threading but CONVERSIONS should probably use either multiprocessing or async. Determine which one is best?

To elaborate, I have tested threading on the opusconversion step, and quickly deduced that I either need to

A) Limit max threads B) Run downloads concurrently but run conversions in a more linear fashion

The cpu load is very heavy for converting files, especially multiple files at once. This may even be something I remove from the program depending on how I end up settling on this.