Add model generation for Control points and Open prices

kenorb commented 7 years ago

Extend convert_csv_to_mt.py script to have 3 modes of FXT conversion:

-m 0: Every tick (the most precise method) - this is how it works by default now,
-m 1: Control points (the nearest less timeframe, 1 minute OHLC),

Historical data of the nearest less timeframe must be available. As soon as historical data of the less timeframe appear, these are these data that are interpolated. However, the really existing OHLC prices of the less timeframe appear as control points.
-m 2: Open prices only

In this mode, first, bar is opened (Open = High = Low = Close, Volume=1), and this allows the Expert Advisor to identify the end of completion of the preceding price bar.

Specifying multiple modes by: -m 0,1,2 should be supported as well, so as result 3 files should be generated at one run. This -m param only applies to FXT format. For HST it should be ignored.

Filename

The generated FXT filename must have type name SSSSSSPP_M.fxt where:

SSSSSS - is the symbol under test;
PP - the value of the tested symbol period in minutes;
M - testing model (0 - "Every tick", 1 - "Control points", 2 - "Open prices only").

Currently it's generated correctly for Every tick model (with 0), we need to add support for 1 and 2. If model is not specified, by default generate the Every tick model (0).

Resources

For more explanation about mode of modelling, see:

Steps

In some folder, clone repo with CSV files:

git clone --branch EURUSD-2014 --single-branch https://github.com/FX31337/FX-BT-Data-EURUSD-DS

Combine CSV data from 2 days into single file:

find FX*  \( -name "2014-02-02*" -o -name "2014-02-03*" \) -exec cat {} ';' | sort > ticks.csv

or 4 days:

find FX* \( -name "2014-02-02*" -o -name "2014-02-03*" -o -name "2014-02-04*" -o -name "2014-02-05*" \) -exec cat {} ';' | sort > ticks.csv

Clone this repo:

git clone https://github.com/FX-Data/FX-Data-EURUSD-DS

Convert CSV file into FXT (for M1, M5 and M30 timeframe) or using Docker:
```
./convert_csv_to_mt.py -v -i ticks.csv -f fxt4 -t M1,M5,M30
```
Similar to hst4 format if you need to for testing.

Read the generated files by:

./convert_mt_to_csv.py -i EURUSD1_0.fxt -f fxt4 | less

Now generate file with control points (-m 1 to be implemented):
```
./convert_csv_to_mt.py -v -i ticks.csv -f fxt4 -t M1,M5,M30 -m 1
```
This should generate 3 files EURUSD1_1.fxt, EURUSD5_1.fxt, EURUSD30_1.fxt having only control point prices as described above.

Sample 1

M30

$ ./FX-BT-Scripts/convert_mt_to_csv.py -i EURUSD30_0.fxt -f fxt4 | head
2014.02.02 22:00:00,1.34842,1.34842,1.34842,1.34842,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34837,1.34837,1.34837,1.34837,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34828,1.34828,1.34828,1.34828,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34828,1.34828,1.34828,1.34828,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34832,1.34832,1.34832,1.34832,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34832,1.34832,1.34832,1.34832,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34827,1.34827,1.34827,1.34827,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34829,1.34829,1.34829,1.34829,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34835,1.34835,1.34835,1.34835,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34837,1.34837,1.34837,1.34837,1,2014.02.02 22:00:00,4
$ ./FX-BT-Scripts/convert_mt_to_csv.py -i EURUSD30_1.fxt -f fxt4 | head
2014.02.02 22:00:00,1.34842,1.34871,1.34820,1.34832,717,2014.02.02 22:29:59,0
2014.02.02 22:30:00,1.34843,1.34905,1.34818,1.34863,2299,2014.02.02 22:59:59,0
2014.02.02 23:00:00,1.34865,1.34912,1.34851,1.34871,2650,2014.02.02 23:29:59,0
2014.02.02 23:30:00,1.34867,1.34892,1.34843,1.34873,2124,2014.02.02 23:59:59,0
2014.02.03 00:00:00,1.34873,1.34878,1.34800,1.34865,3445,2014.02.03 00:29:59,0
2014.02.03 00:30:00,1.34866,1.34916,1.34836,1.34865,4396,2014.02.03 00:59:59,0
2014.02.03 01:00:00,1.34864,1.34874,1.34801,1.34855,4995,2014.02.03 01:29:59,0
2014.02.03 01:30:00,1.34855,1.34865,1.34820,1.34850,3583,2014.02.03 01:59:59,0
2014.02.03 02:00:00,1.34852,1.34853,1.34798,1.34830,3988,2014.02.03 02:29:59,0
2014.02.03 02:30:00,1.34830,1.34853,1.34809,1.34849,3820,2014.02.03 02:59:59,0
$ ./FX-BT-Scripts/convert_mt_to_csv.py -i EURUSD30_2.fxt -f fxt4 | head
2014.02.02 22:00:00,1.34842,1.34871,1.34820,1.34832,717,2014.02.02 22:29:59,0
2014.02.02 22:30:00,1.34843,1.34905,1.34818,1.34863,2299,2014.02.02 22:59:59,0
2014.02.02 23:00:00,1.34865,1.34912,1.34851,1.34871,2650,2014.02.02 23:29:59,0
2014.02.02 23:30:00,1.34867,1.34892,1.34843,1.34873,2124,2014.02.02 23:59:59,0
2014.02.03 00:00:00,1.34873,1.34878,1.34800,1.34865,3445,2014.02.03 00:29:59,0
2014.02.03 00:30:00,1.34866,1.34916,1.34836,1.34865,4396,2014.02.03 00:59:59,0
2014.02.03 01:00:00,1.34864,1.34874,1.34801,1.34855,4995,2014.02.03 01:29:59,0
2014.02.03 01:30:00,1.34855,1.34865,1.34820,1.34850,3583,2014.02.03 01:59:59,0
2014.02.03 02:00:00,1.34852,1.34853,1.34798,1.34830,3988,2014.02.03 02:29:59,0
2014.02.03 02:30:00,1.34830,1.34853,1.34809,1.34849,3820,2014.02.03 02:59:59,0

M5

$ ./convert_mt_to_csv.py -i EURUSD5_0.fxt -f fxt4 | head
2014.02.02 22:00:00,1.34842,1.34842,1.34842,1.34842,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34837,1.34837,1.34837,1.34837,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34828,1.34828,1.34828,1.34828,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34828,1.34828,1.34828,1.34828,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34832,1.34832,1.34832,1.34832,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34832,1.34832,1.34832,1.34832,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34827,1.34827,1.34827,1.34827,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34829,1.34829,1.34829,1.34829,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34835,1.34835,1.34835,1.34835,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34837,1.34837,1.34837,1.34837,1,2014.02.02 22:00:00,4
$ ./convert_mt_to_csv.py -i EURUSD5_1.fxt -f fxt4 | head
2014.02.02 22:00:00,1.34842,1.34842,1.34821,1.34821,60,2014.02.02 22:04:59,0
2014.02.02 22:05:00,1.34823,1.34845,1.34823,1.34833,65,2014.02.02 22:09:59,0
2014.02.02 22:10:00,1.34833,1.34855,1.34831,1.34850,142,2014.02.02 22:14:59,0
2014.02.02 22:15:00,1.34850,1.34865,1.34850,1.34865,90,2014.02.02 22:19:59,0
2014.02.02 22:20:00,1.34864,1.34866,1.34857,1.34864,67,2014.02.02 22:24:59,0
2014.02.02 22:25:00,1.34862,1.34871,1.34820,1.34832,291,2014.02.02 22:29:59,0
2014.02.02 22:30:00,1.34843,1.34863,1.34821,1.34860,595,2014.02.02 22:34:59,0
2014.02.02 22:35:00,1.34860,1.34867,1.34857,1.34862,276,2014.02.02 22:39:59,0
2014.02.02 22:40:00,1.34861,1.34867,1.34859,1.34862,270,2014.02.02 22:44:59,0
2014.02.02 22:45:00,1.34865,1.34895,1.34865,1.34889,234,2014.02.02 22:49:59,0
$ ./convert_mt_to_csv.py -i EURUSD5_2.fxt -f fxt4 | head
2014.02.02 22:00:00,1.34842,1.34842,1.34821,1.34821,60,2014.02.02 22:04:59,0
2014.02.02 22:05:00,1.34823,1.34845,1.34823,1.34833,65,2014.02.02 22:09:59,0
2014.02.02 22:10:00,1.34833,1.34855,1.34831,1.34850,142,2014.02.02 22:14:59,0
2014.02.02 22:15:00,1.34850,1.34865,1.34850,1.34865,90,2014.02.02 22:19:59,0
2014.02.02 22:20:00,1.34864,1.34866,1.34857,1.34864,67,2014.02.02 22:24:59,0
2014.02.02 22:25:00,1.34862,1.34871,1.34820,1.34832,291,2014.02.02 22:29:59,0
2014.02.02 22:30:00,1.34843,1.34863,1.34821,1.34860,595,2014.02.02 22:34:59,0
2014.02.02 22:35:00,1.34860,1.34867,1.34857,1.34862,276,2014.02.02 22:39:59,0
2014.02.02 22:40:00,1.34861,1.34867,1.34859,1.34862,270,2014.02.02 22:44:59,0
2014.02.02 22:45:00,1.34865,1.34895,1.34865,1.34889,234,2014.02.02 22:49:59,0

M1

$ ./convert_mt_to_csv.py -i EURUSD1_0.fxt -f fxt4 | head
2014.02.02 22:00:00,1.34842,1.34842,1.34842,1.34842,1,2014.02.02 22:00:00,4
2014.02.02 22:00:00,1.34837,1.34837,1.34837,1.34837,1,2014.02.02 22:00:03,4
2014.02.02 22:00:00,1.34828,1.34828,1.34828,1.34828,1,2014.02.02 22:00:04,4
2014.02.02 22:00:00,1.34828,1.34828,1.34828,1.34828,1,2014.02.02 22:00:09,4
2014.02.02 22:00:00,1.34832,1.34832,1.34832,1.34832,1,2014.02.02 22:00:09,4
2014.02.02 22:00:00,1.34832,1.34832,1.34832,1.34832,1,2014.02.02 22:00:10,4
2014.02.02 22:00:00,1.34827,1.34827,1.34827,1.34827,1,2014.02.02 22:00:10,4
2014.02.02 22:00:00,1.34829,1.34829,1.34829,1.34829,1,2014.02.02 22:00:12,4
2014.02.02 22:00:00,1.34835,1.34835,1.34835,1.34835,1,2014.02.02 22:00:14,4
2014.02.02 22:00:00,1.34837,1.34837,1.34837,1.34837,1,2014.02.02 22:00:15,4
 $ ./convert_mt_to_csv.py -i EURUSD1_1.fxt -f fxt4 | head
2014.02.02 22:00:00,1.34842,1.34842,1.34827,1.34833,19,2014.02.02 22:00:59,0
2014.02.02 22:01:11,1.34837,1.34842,1.34834,1.34839,23,2014.02.02 22:02:04,0
2014.02.02 22:02:05,1.34840,1.34841,1.34825,1.34827,11,2014.02.02 22:03:04,0
2014.02.02 22:03:11,1.34827,1.34832,1.34827,1.34827,2,2014.02.02 22:04:04,0
2014.02.02 22:04:05,1.34821,1.34821,1.34821,1.34821,3,2014.02.02 22:04:59,0
2014.02.02 22:05:00,1.34823,1.34845,1.34823,1.34845,16,2014.02.02 22:05:59,0
2014.02.02 22:06:26,1.34839,1.34840,1.34839,1.34840,6,2014.02.02 22:07:25,0
2014.02.02 22:07:27,1.34837,1.34841,1.34837,1.34841,8,2014.02.02 22:08:05,0
2014.02.02 22:08:06,1.34835,1.34835,1.34827,1.34827,26,2014.02.02 22:09:01,0
2014.02.02 22:09:02,1.34830,1.34833,1.34830,1.34833,8,2014.02.02 22:10:01,0
$ ./convert_mt_to_csv.py -i EURUSD1_2.fxt -f fxt4 | head
2014.02.02 22:00:00,1.34842,1.34842,1.34827,1.34833,19,2014.02.02 22:00:59,0
2014.02.02 22:01:11,1.34837,1.34842,1.34834,1.34839,23,2014.02.02 22:02:10,0
2014.02.02 22:02:05,1.34840,1.34841,1.34825,1.34827,11,2014.02.02 22:03:04,0
2014.02.02 22:03:11,1.34827,1.34832,1.34827,1.34827,2,2014.02.02 22:04:10,0
2014.02.02 22:04:05,1.34821,1.34821,1.34821,1.34821,3,2014.02.02 22:05:04,0
2014.02.02 22:05:00,1.34823,1.34845,1.34823,1.34845,16,2014.02.02 22:05:59,0
2014.02.02 22:06:26,1.34839,1.34840,1.34839,1.34840,6,2014.02.02 22:07:25,0
2014.02.02 22:07:27,1.34837,1.34841,1.34837,1.34841,8,2014.02.02 22:08:26,0
2014.02.02 22:08:06,1.34835,1.34835,1.34827,1.34827,26,2014.02.02 22:09:05,0
2014.02.02 22:09:02,1.34830,1.34833,1.34830,1.34833,8,2014.02.02 22:10:01,0

Check files from EURUSD-2014/EURUSD/2014/02 for more accurate CSV data to compare with.

The above FXT files has been uploaded below.

The above rows are in the following order: bar timestamp, open, high, low, close, volume, timestamp.

Sample files

These files can be read by convert_mt_to_csv.py script as shown above. Or visually you can open them by MT4 platform and open by File, Open Offline (files needs to be placed in tester/history folder of the platform dir.

See also related issue: #86

Est. 16h

kenorb commented 6 years ago

Basically, it should work the same as the MT4 platform does it by default when backtesting. It's got its own built-in converter, so we can compare the result data.

ghost commented 6 years ago

Hello again. I have been going through the data to understand how to solve this issue, but I have some questions.

First, what exactly is a price bar? From what I can see, it appears to be a collection of ticks for a given timeframe. The examples you provided appear to just take all the available ticks and group them within every 1, 5, or 30 minute timeframes that occur in the input ticks.

Second, do you have any examples of what the other models would generate for a given set of ticks? I am having some difficulty understanding the control points model in particular, but the open price one appears to be very simple. It sounds like it uses only the opening price to generate the tick for a given price bar.

Thanks.

Edit: I also notice that you say the "every tick" model is already implemented, but it does not appear that any interpolation is already performed. Is interpolation necessary for any of this?

kenorb commented 6 years ago

Price bar is a combination of OHLC values (open-high-low-close) for the given timeframe (e.g. 1 minute).
There are some samples provided in the description.
Real conversion can be seen when using MT4 platform. You copy EURUSD1_0.fxt into the platform (or generate a small one using the scripts from CSV to FXT), select Control points and run a backtest, this will generate EURUSD1_1.fxt which can be read using the mentioned Python scripts. Backtesting steps can be found here.

kenorb commented 6 years ago

"every tick" model is already implemented

That mean every tick in CSV is present in FXT (downside is slowness and a big file). When you convert to control points, you're filtering out the ticks which are less useful. So basically instead of having 100-200 ticks per minute, you've only 4 major ticks per minute (OHLC values). In open price mode, you've only open values (a single tick) per timeframe (e.g. M1).

kenorb commented 6 years ago

Let me know if anything needs further clarification.

ghost commented 6 years ago

I've been reviewing the open prices model, and it appears MT4 does some weird things. For some timeframes, it chooses to output one tick, on others it chooses to output two ticks. When generating two ticks, it appears to generate one for the start and another for the end of the timeframe.

Any ideas how I should handle this? I can't seem to find any correlations to explain why and when it chooses to act in this manner. I had thought the open price model might be as simple as just keeping the first tick of a timeframe and discarding the rest, but there's evidently more to it. In particular, it appears I am expected to modify the tick's volume to be the sum of all the tick volumes for the timeframe.

kenorb commented 6 years ago

Maybe MT4 is using data from 'the nearest less timeframe must be available'.

Please check this Support point section. But I'm not sure if this describes the logic of Control points mode. If it's not, or it's not clear, try to make it more logical, as far as MT4 won't generate any validation data errors during testing.

In my understanding Control mode should be done by keeping the ticks which are either higher highs (keep the next tick if it's higher than the previous high), lower lows (keep the next tick only if it's lower than the previous low). This way we can filter out useless ticks which doesn't provide any new low/highs. The main goal of Control mode is to filter out the useless ticks, and keeping the major changes of the price. So if that make sense, you can implement this way, even the method from MT4 differ. We can always compare the results later and find which method is more reliable.

And Open mode can be implemented by having only 4 ticks per timeframe (OHLC data), unless they're all the same. The volume should still match to avoid any data validation errors.

Let me know if that make sense.

FX31337 / FX-BT-Scripts