toshkov commented 2 years ago

I created a simple bash script to automatically update the exchange rates. The database is updated only once a day after 15:00 UTC (excluding weekends & holidays) thus minimizing HTTP requests to a minimum.

alexprengere commented 2 years ago

Hi! Thank you for your interest in the project. I understand you are trying to update the bash script to download the data, and integrate it in the library. A few remarks:

you can already retrieve the latest data by providing an argument to CurrencyConverter, either with a local path or a url:
```
c = CurrencyConverter("https://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.zip")
```
If you want to avoid repeating the download, you can have a cron running the update_data.sh and load that update file with CurrencyConverter("/path/to/latest/eurofxref-hist.zip").
if we wanted to add logic to update the embedded data, I would rather have it directly in Python, rather than a subprocess calling the shell script. It would be easier to unit test and maintain.
I am unsure if this is a good idea to modify the embedded data "silently" when importing the module, because it would "violate" the "immutability" of a currencyconverter release, which is properly tested. A library update means an embedded data update. Importing the module here would mean silently updating the embedded data, and subsequent uses of CurrencyConverter will be affected by this (unlike the version of the first point)

toshkov commented 2 years ago

I understand. But I also don't think that manually updating the repo with new data every 5 days, for 5 years straight, is a solution either. I think it is a very inefficient way of solving the problem.

I also saw numerous issues regarding this, so I am not the only one who finds it inconvenient:

4, #5, #18, #22, #23, #29, #33

About the remarks:

By providing an argument before the initialization of CurrencyConverter, so that way it downloads the data when the program is run? This makes unnecessary network traffic every time it is started, which also slows down the program itself. Running a background cron job would mean that it would download the data every day, regardless of it is a TARGET closing day (130 days in 1 year are closed).

The script I wrote only downloads the data if it sees that there is a new version available. Also, by having the script check if there is an active database in the first place, removes the responsibility and need from you to manually update the repo every few days, as it will just download a new recent database the first time the user runs it.
I am not sure I understand exactly what you mean by "silently updating the embedded data"? Isn't running a cron job in the background also silently updating the embedded data?

The main problem is that the application does not run out-of-the-box and needs tweaking to make it automatically update the data. The program is more or less useless if the data in it is not up-to-date, and having the user manually update the data, or to create cron jobs, makes it inconvenient.

Finally, I understand that you would like to have it directly integrated with Python. I could rewrite the script in the future in Python if I have the spare time. But I think this script works well enough in the meantime.

alexprengere commented 2 years ago

I understand that lots of issues relate to the data, but I feel this is more a doc issue, and I tried to improve that.

Regarding "unnecessary network traffic", we are talking about 500K of data, that is updated once a day (except weekends). So any logic to limit network traffic is unnecessary in my opinion.

Regarding the embedded data, it is different for me. I would rather have an explicit cron that updates the data, and an explicit CurrencyConverter("latest.zip"), rather than a hidden piece of code doing downloads at import time, then a CurrencyConverter() call. A better design might be to have an explicit entry point, something like:

from currency_converter import CurrencyConverter

CurrencyConverter.update_embedded_data()  # runs the update script
c = CurrencyConverter()

But I think I still prefer something more explicit, like:

from datetime import date
import os.path
import urllib.request
from currency_converter import ECB_URL, CurrencyConverter

# Simple way to always get the latest data, without downloading the source at every import
ecb_file = f"ecb_{date.today():%Y%m%d}.zip"
if not os.path.isfile(ecb_file):   # will not download more than once a day
    urllib.request.urlretrieve(ECB_URL, ecb_file)
c = CurrencyConverter(ecb_file)

Obviously if you do not care about the download, you should just run CurrencyConverter(ECB_URL).

The overall spirit of the library is to process the ECB data, having an embedded version is a convenience. I expect users to set something up if they want the latest data, or download the 500K every time. But I agree we may improve things and provide a helper. To give you another example, I use this at work with completely different data sources (not open), and we just process those sources to fit the ECB data layout, then we can just load it with CurrencyConverter("alt.zip"). We actually do not use the ECB due to the limited currencies available.

toshkov commented 2 years ago

Okay, I understand. I wrote the script for myself and wanted to share it. I think the very least that could be done is an auto check in Python to automatically download the database at first run, so you won't have the need to update the repo (and PyPI) every 5 days. I am going to close this pull request.

alexprengere / currencyconverter

Added bash script to automatically update the exchange rates #34

4, #5, #18, #22, #23, #29, #33