eprbell / dali-rp2

DaLI (Data Loader Interface) is a data loader and input generator for RP2 (https://pypi.org/project/rp2), the privacy-focused, free, open-source cryptocurrency tax calculator: DaLI removes the need to manually prepare RP2 input files. Just like RP2, DaLI is also free, open-source and it prioritizes user privacy.
https://pypi.org/project/dali-rp2/
Apache License 2.0
63 stars 42 forks source link

Troubleshoot Issues Downloading Kraken CSV Master File #252

Open rapus95 opened 1 month ago

rapus95 commented 1 month ago
D:\Tax\Crypto\dali>env CURRENCY_CODE=EUR LONG_TERM_CAPITAL_GAINS=365 dali_generic -s -o output -c daliconfig.ini
INFO: Country: generic
INFO: Initialized input plugin 'dali.plugin.input.rest.binance_com'
INFO: Initialized pair converter plugin 'dali.plugin.pair_converter.ccxt'
INFO: Reading crypto data for plugin 'dali.plugin.input.rest.binance_com' from cache
INFO: Building manifest to optimize price calculation with the pair converters.
INFO: Resolving transactions
 48% |####################################                                       | Elapsed Time: 0:00:00 ETA:   0:00:00Do you want to download the file now?[yn]y
INFO: Attempting to retrieve ETHWUSD pair from the unified Kraken CSV file.
INFO: Corrupt unified CSV file found, deleting and trying again.
INFO: .dali_cache/kraken/csv/Kraken_OHLCVT.zip has been safely deleted.
INFO:
In order to provide accurate pricing from Kraken, a large (4.1+ gb) zipfile needs to be downloaded.
INFO: Downloading the unified CSV from https://drive.usercontent.google.com/download?id=11WtjXA9kvVYV9KDoebGV5U75dmcA3bJa&export=download&confirm=t&uuid=851b430d-779c-4fe8-abf5-5ee344b6d8b5
Downloading: |                                                                             #    |  36.0 MiB  46.7 KiB/s

ERROR: Fatal exception occurred:
Traceback (most recent call last):
  File "C:\Python312\Lib\site-packages\dali\plugin\pair_converter\csv\kraken.py", line 475, in _unzip_and_chunk
    with ZipFile(self.__UNIFIED_CSV_FILE, "r") as zip_ref:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\zipfile\__init__.py", line 1349, in __init__
    self._RealGetContents()
  File "C:\Python312\Lib\zipfile\__init__.py", line 1416, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python312\Lib\site-packages\urllib3\response.py", line 737, in _error_catcher
    yield
  File "C:\Python312\Lib\site-packages\urllib3\response.py", line 862, in _raw_read
    data = self._fp_read(amt, read1=read1) if not fp_closed else b""
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\urllib3\response.py", line 845, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
           ^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\http\client.py", line 479, in read
    s = self.fp.read(amt)
        ^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\socket.py", line 708, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\ssl.py", line 1252, in recv_into
    return self.read(nbytes, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\ssl.py", line 1104, in read
    return self._sslobj.read(len, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: The read operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Python312\Lib\site-packages\requests\models.py", line 820, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "C:\Python312\Lib\site-packages\urllib3\response.py", line 1043, in stream
    data = self.read(amt=amt, decode_content=decode_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\urllib3\response.py", line 935, in read
    data = self._raw_read(amt)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\urllib3\response.py", line 861, in _raw_read
    with self._error_catcher():
  File "C:\Python312\Lib\contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "C:\Python312\Lib\site-packages\urllib3\response.py", line 742, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.") from e  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='drive.usercontent.google.com', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python312\Lib\site-packages\dali\dali_main.py", line 193, in _dali_main_internal
    resolved_transactions: List[AbstractTransaction] = resolve_transactions(transactions, dali_configuration, args.read_spot_price_from_web)
                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\dali\transaction_resolver.py", line 286, in resolve_transactions
    transaction = _update_spot_price_from_web(transaction, global_configuration)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\dali\transaction_resolver.py", line 138, in _update_spot_price_from_web
    conversion: RateAndPairConverter = _get_pair_conversion_rate(
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\dali\transaction_resolver.py", line 110, in _get_pair_conversion_rate
    rate = cast(AbstractPairConverterPlugin, pair_converter).get_conversion_rate(timestamp, from_asset, to_asset, exchange)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\dali\abstract_pair_converter_plugin.py", line 88, in get_conversion_rate
    historical_bar = self.get_historic_bar_from_native_source(timestamp, from_asset, to_asset, exchange)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\dali\abstract_ccxt_pair_converter_plugin.py", line 371, in get_historic_bar_from_native_source
    self._cache_graph_snapshots(exchange)
  File "C:\Python312\Lib\site-packages\dali\abstract_ccxt_pair_converter_plugin.py", line 752, in _cache_graph_snapshots
    optimizations = self._optimize_assets_for_exchange(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\dali\abstract_ccxt_pair_converter_plugin.py", line 898, in _optimize_assets_for_exchange
    bar_check = self.find_historical_bars(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\dali\abstract_ccxt_pair_converter_plugin.py", line 530, in find_historical_bars
    csv_bar = csv_reader.find_historical_bars(from_asset, to_asset, timestamp, True, _ONE_WEEK)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\dali\plugin\pair_converter\csv\kraken.py", line 460, in find_historical_bars
    if self._unzip_and_chunk(base_asset, quote_asset, all_bars):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\dali\plugin\pair_converter\csv\kraken.py", line 500, in _unzip_and_chunk
    self.__download_unified_csv()
  File "C:\Python312\Lib\site-packages\dali\plugin\pair_converter\csv\kraken.py", line 248, in __download_unified_csv
    for chunk in response.iter_content(_CHUNK_SIZE_BYTES):
  File "C:\Python312\Lib\site-packages\requests\models.py", line 826, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='drive.usercontent.google.com', port=443): Read timed out.
INFO: Log file: ./log/rp2_2024_07_26_22_34_27_061032.log
INFO: Generated output directory: output
INFO: Done
eprbell commented 1 month ago

This looks like a timeout error. A couple of questions:

CC: @macanudo527, who worked on the Kraken CSV pair converter.

macanudo527 commented 1 month ago

@eprbell It is the latest version looking at the URL.

It looks like a connection error, like you got disconnected perhaps.

Did you try deleting the file and retrying?

You can also manually download the file from the URL: https://drive.usercontent.google.com/download?id=11WtjXA9kvVYV9KDoebGV5U75dmcA3bJa&export=download&confirm=t&uuid=851b430d-779c-4fe8-abf5-5ee344b6d8b5 and place it in .dali_cache/kraken/csv

rapus95 commented 1 month ago

manual download circumvented the problem

macanudo527 commented 2 weeks ago

We will probably have to add an individual test of some sort to check if this mechanism works, since it is extremely brittle and will have to be updated regularly.

We will also need to check if the prompt is visible or not as per #240

rapus95 commented 2 weeks ago

what about offering two ways, try automatic download and ask for manual download, providing link and target location. that way we get the best of both worlds. Trying to do everything for the user but also providing an alternative in case it doesn't work. Have it like a text adventure without dead ends. Because dead ends where you have no clue what to do are dumb. 😐😂