giuse88 / duka

duka - Dukascopy historical data downloader
http://giuse88.github.io/duka
MIT License
302 stars 135 forks source link

Local time issue investigation #29

Open evianzhow opened 6 years ago

evianzhow commented 6 years ago

I’ve rebased release–2.1 branch into master, all my investigation is base on that state. Replicate that step if you needed.

In duka/ore/processor.py, if the user added --local-time to the command line, the program just replace the ticks timezone, which called tzinfo with tz.tzlocal(). But we did it in a wrong way. .replace() function will return a new object, which is the modified one. That means the statements does nothing. Just changing the line to an assignment one still doesn’t work. The problem is much more complex. Let’s me explain this to you.

In Dukascopy Historical Data Feed, which duka intentionally want to be its command-line alternative one, treats local time options differently than our current implementation. Our target is to generate the same result as the official one.

Here’s the scenario: I’ve been live in a UTC+8 country, I want to download XAU/USD candlesticks data from date 2017–09–06 to 2017–09–07, and the timeframe I need is 4 hours.

If I chose Local option on the official tool, I’will get something like this:

Local time  Open  High  Low Close Volume
06.09.2017 00:00:00.000 1335.251  1344.328  1335.151  1340.958  18168971.97
06.09.2017 04:00:00.000 1340.898  1342.172  1338.062  1341.571  4586580.001
06.09.2017 08:00:00.000 1341.569  1342.429  1337.639  1337.761  8711390.002
06.09.2017 12:00:00.000 1337.762  1339.498  1336.299  1337.438  9259679.997
06.09.2017 16:00:00.000 1337.418  1341.828  1337.261  1339.292  13025489.99
06.09.2017 20:00:00.000 1339.292  1340.892  1335.888  1339.868  19017017.99
07.09.2017 00:00:00.000 1339.868  1340.238  1331.598  1334.172  14472906.98
07.09.2017 04:00:00.000 1334.171  1334.572  1332.501  1333.401  3560730.003
07.09.2017 08:00:00.000 1333.388  1335.878  1332.591  1334.808  8341879.984
07.09.2017 12:00:00.000 1334.828  1338.179  1333.978  1337.429  10472150
07.09.2017 16:00:00.000 1337.438  1340.361  1336.971  1338.631  11213249.98
07.09.2017 20:00:00.000 1338.632  1349.328  1338.632  1346.978  24777360.03

If i chose GMT:

Gmt time  Open  High  Low Close Volume
06.09.2017 00:00:00.000 1341.569  1342.429  1337.639  1337.761  8711390.002
06.09.2017 04:00:00.000 1337.762  1339.498  1336.299  1337.438  9259679.997
06.09.2017 08:00:00.000 1337.418  1341.828  1337.261  1339.292  13025489.99
06.09.2017 12:00:00.000 1339.292  1340.892  1335.888  1339.868  19017017.99
06.09.2017 16:00:00.000 1339.868  1340.238  1331.598  1334.172  14472906.98
06.09.2017 20:00:00.000 1334.171  1334.572  1332.501  1333.401  3560730.003
07.09.2017 00:00:00.000 1333.388  1335.878  1332.591  1334.808  8341879.984
07.09.2017 04:00:00.000 1334.828  1338.179  1333.978  1337.429  10472150
07.09.2017 08:00:00.000 1337.438  1340.361  1336.971  1338.631  11213249.98
07.09.2017 12:00:00.000 1338.632  1349.328  1338.632  1346.978  24777360.03
07.09.2017 16:00:00.000 1346.981  1349.522  1342.378  1348.218  13315830.04
07.09.2017 20:00:00.000 1348.118  1349.839  1347.309  1348.838  3759689.994

Current program implementation will output something like this:

06.09.2017 08:00:00.000 1341.569  1342.429  1337.639  1337.761  8711390.002
06.09.2017 12:00:00.000 1337.762  1339.498  1336.299  1337.438  9259679.997
06.09.2017 16:00:00.000 1337.418  1341.828  1337.261  1339.292  13025489.99
06.09.2017 20:00:00.000 1339.292  1340.892  1335.888  1339.868  19017017.99
07.09.2017 00:00:00.000 1339.868  1340.238  1331.598  1334.172  14472906.98
07.09.2017 04:00:00.000 1334.171  1334.572  1332.501  1333.401  3560730.003
07.09.2017 08:00:00.000 1333.388  1335.878  1332.591  1334.808  8341879.984
07.09.2017 12:00:00.000 1334.828  1338.179  1333.978  1337.429  10472150
07.09.2017 16:00:00.000 1337.438  1340.361  1336.971  1338.631  11213249.98
07.09.2017 20:00:00.000 1338.632  1349.328  1338.632  1346.978  24777360.03
07.10.2017 00:00:00.000 1346.981  1349.522  1342.378  1348.218  13315830.04
07.10.2017 04:00:00.000 1348.118  1349.839  1347.309  1348.838  3759689.994

It’s quite obvious that Line 1 at GMT is correspondent to Line 3 at Local, then Line 2 is to Line 4, etc.. We can easily conclude that, if the user wants a local version of data, he actually wants the data begins from the midnight of the start date and ends to the midnight of the end date in his locale.

Then I started changing some related code. Like in duka/ore/processor.py, instead of simply fetching range(0, 24) hours, I changed to an more precise day-and-hour way. First converting user-inputted start date and end date to datetime instances with local timezone tzinfo, then convert to GMT start datetime and end datetime.

Under fetch.py:

    if local_time:
        day = datetime.combine(day, datetime.min.time())  # convert to datetime, with start at midnight
        day = day.replace(tzinfo=tz.tzlocal())
        day = day.astimezone(tz.tzoffset(None, 0))
    tasks = []
    for i in range(0, 24):
        delta_day = day + timedelta(hours=i)
        url_info = {
            'currency': symbol,
            'year': delta_day.year,
            'month': delta_day.month - 1,
            'day': delta_day.day,
            'hour': delta_day.hour
        }
        tasks.append(asyncio.ensure_future(get(URL.format(**url_info))))

I tried after these modifies but still not working as expected. I’ve find something interesting. Since we don’t preserve each fetch results return state (we just adding its serialized result to BufferIO buffer), we need add_hour function to keep the final result hour part correct. And current implementation is under the assumption of market opens at the beginning of GMT day (hour_delta = 0 on Line 42, under duka/ore/processor.py) and always closes at the end of GMT day.

I’m confused by this function pretty much, here’s some reasons:

  1. Why we could just assume hour_delta begins from zero, is there any possibility that some market doesn't begin at GMT midnight?
  2. add_hours function treats ticks[0] differently if ticks[0] is Saturday or the first day of an year. Why?
  3. Since we don’t preserve fetch results return state. We don’t know whether the result returns empty or data. It’s all being combined or reduced! In the above scenario, XAU/USD market closed at 5 a.m. and re-opened at 6 a.m. in my locale, but current implementation will output the market closed at 11 pm. and re-opened at midnight. When encounter Monday, it will output the market closed at 6 p.m.. This question is actually the same and Question 1.

To resolve that issue, I think the program will need quite a large refactor, and I like to hear the opinions from original author and the community.

giuse88 commented 6 years ago

thank you for your help. Now i don t have much to work on this sorry