algometrica / larytet-master

Automatically exported from code.google.com/p/larytet-master
0 stars 0 forks source link

Data verification #25

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
A number of tests to do to validate the data:
1.
The changes of interest come in pairs (price, quantity).

Here is the list of such pairs:
       1- lmt_by1 , lmy_by1_nv
       2- lmt_by2 , lmy_by2_nv
       3- lmt_by3 , lmy_by3_nv
       4- lmt_sl1 , lmy_sl1_nv
       5- lmt_sl2 , lmy_sl2_nv
       6- lmt_sl3 , lmy_sl3_nv
       7- lst_dl_pr , lst_dl_vl

Check that either one of these pairs is updated in each record (no change
occurred in the other pairs). Multiple updates in the same record/event may
indicate a problem.

2.
Check that the last transaction data was retrieved correctly. This test can
be also an indication of a new transaction. Two steps:

       1- day_dil_no increments by 1. If greater increment was detected -
we have a problem.
       2- when there is increment in day_dil_no, check that delta(day_lv)
== lst_dl_vl.

I think that this is enough for the moment. I'll think about statistical
tests (what is "very different").

The ultimate solution to the problem will be to get the data simultaneously
from two or more separate sources. This will base the algo decision making
on more solid ground.

Original issue reported on code.google.com by larytet@gmail.com on 30 Nov 2009 at 10:36

GoogleCodeExporter commented 9 years ago
3.
Check the timestamp produced by FMR - upd_time. It
should be always non-decreasing. In case of disconnect it can emit an
arbitrary values that are less than previously recorded ones.

Original comment by larytet@gmail.com on 1 Dec 2009 at 8:17

GoogleCodeExporter commented 9 years ago
Test data 
http://larytet-master.googlecode.com/files/testdata.rar

- missing records (a sequence of 5-15 records dropped) - you will detect
them by multiple events (pairs of price, quantity) changed between two
records.
- missing data - erased limit book data from a record
- broken upd_time field - I set it to be '08485123' for randomly chosen
records
- broken transaction volume data - I subtracted 1 from the value of
lst_dl_vl for a number of record for call1050 (ex_prc==1050 and sug_bno==1)

Original comment by larytet@gmail.com on 1 Dec 2009 at 11:51

GoogleCodeExporter commented 9 years ago
See the attached file with the examples of data - both correct and incorrect, 
for
each problem in the # 1 and 2 in the list above.

for the missing data / incomplete record I'd propose to liik that upd_time is 
not
null. The reason is that it comes one of the last in the structure, so if the 
record
is cut in the middle, it won't get a vualue. Another way is to look if lmt_by1 
and
/or lmt_sl1 aren't empty (they can be, technically, if there is no orders on 
the book).

Original comment by jerusale...@gmail.com on 5 Dec 2009 at 8:31

Attachments:

GoogleCodeExporter commented 9 years ago
Another set of examples taken from bad_test.csv. On the attached excel 4 
additional
examples, with full-length data:
1. Delta(DAY_DIL_NO) != LST_DIL_VL
2. Missing data (all the limit order data is missing)
3. UPD_TIME decreases
4. Multiple events per single record

All the relevant data is marked in yellow.

Original comment by jerusale...@gmail.com on 6 Dec 2009 at 7:24

Attachments:

GoogleCodeExporter commented 9 years ago
I do not immediately see what can be done about missing entries, besides 
printing it
after the fact. Can we need this input in the trading algorithm ?

Original comment by larytet@gmail.com on 11 Dec 2009 at 8:25