DavZim / RITCH

An R interface to the ITCH Protocol
https://davzim.github.io/RITCH/
Other
18 stars 5 forks source link

ITCH 4.1 and 3.0 #10

Closed macdeutsche closed 5 years ago

macdeutsche commented 5 years ago

Hi, David.

It turns out ITCH 5.0 has made some changes and that affects our research, since a variable is now missing. So I have to go back to ITCH 4.1 files. In NASDAQ's website, there is a few remaining, and we don't know for sure this is the file. But I have to try.

So there was a post about parsing ITCH 4.1.

There you wrote :

What you would have to to is:

update the message lengths in Specifications.h update the XXX::loadMessages() functions for XXX as Orders, Trades, Modifications (you may need to adjust the iterator (i.e., buf[0] etc) to point to the right message part). If you want to add more information from a message, you have to change the specific XXX::loadMessages, XXX::getDF

So I looked at the ITCH 4.1 spec. The number of messages are less than ITCH5.0. There are largely 3 sections in Specifications.h and first part, given that we have subset of messages, I thought this part would work without modification.

So what I mean is that the following would not require modification. /**

But there is this portion in current Specifications.h. I think this is where you need to update the message lengths in Specifications.h.

namespace SIZE { const unsigned long long S = 12; const unsigned long long R = 39; const unsigned long long H = 25; const unsigned long long Y = 20; const unsigned long long L = 26; const unsigned long long V = 35; const unsigned long long W = 12; const unsigned long long K = 28; const unsigned long long J = 35; const unsigned long long A = 36; const unsigned long long F = 40; const unsigned long long E = 31; const unsigned long long C = 36; const unsigned long long X = 23; const unsigned long long D = 19; const unsigned long long U = 35; const unsigned long long P = 44; const unsigned long long Q = 40; const unsigned long long B = 19; const unsigned long long I = 50; const unsigned long long N = 20; }

How do you get this based on ITCH 4.1 spec? http://www.nasdaqtrader.com/content/technicalsupport/specifications/dataproducts/NQTV-ITCH-V4_1.pdf

And the last part would be just fine without modification, I think. // the position (for example in the count) of each message namespace POS { const int S = 0; const int R = 1; const int H = 2; const int Y = 3; const int L = 4; const int V = 5; const int W = 6; const int K = 7; const int J = 8; const int A = 9; const int F = 10; const int E = 11; const int C = 12; const int X = 13; const int D = 14; const int U = 15; const int P = 16; const int Q = 17; const int B = 18; const int I = 19; const int N = 20; } // all messages in a string, to make conversions easier const std::vector TYPESSTRING = {"S","R","H","Y","L","V","W","K","J", "A","F","E", "C","X","D","U","P","Q", "B","I","N"}; }

endif //SPECIFICATIONS_H

Also is there a way just to parse the gz or txt file without doing this? I would like to see how it looks like and that way maybe I can get the message length.

DavZim commented 5 years ago

As to your last question, the ITCH file does not save the data in plain text but in binary. That is it doesn't save the ASCII representation of text, but directly the bytes (the good ol' ones and zeros). Thus you won't be able to look at it with a text-editor for example (the equivalent would be a hex editor, but that will look weird if you are not used to it...). Due to this and its size, you are unfortunately stuck with this approach (or an alternative that parses the binary file).

How I got to these value?! The SIZE S = 12 comes directly from the specifications pdf v5.0 page 3 bottom there you can see that the S message has size 12 (11 offset of the last element plus 1 size of the last element). Using the 4.1 protocol, we see that the S message has only size 6.

So you would def have to update the sizes. Then you also have to adjust the sizes for each trade/order/message function to the "updated" values. I.e., MessageTypes.cpp Trades::loadMessages ( P is described on page 15 of the specifications) first parses general information for all trades (1 byte message type, 2 bytes locate code, 2 bytes tracking no, and 6 bytes timestamp), then specific to the P trade the function parses 8 bytes as order reference, one byte as buy/sell, and so on. These values are directly taken from page 15 of the 5.1 specifications file.

Does that make sense and clear things up?

macdeutsche commented 5 years ago

Thanks so much. I will have to work on this from now on.

I will get to the next part ( MessageTypes.cpp Trades::loadMessages ) after. Thanks for this now, but I will most likely have to ask question again.

Once this works, I will share this file. There will be a lot of benefit if we can use ITCH and as far as I can see now, we have to use files from 4.1 and 3.0. If we get through this, we will make sure to cite you.

macdeutsche commented 5 years ago

OK... think I got it. So what I did is to update SPECIFICATIONS.H file and updates the MessageTypes.cpp. (updating message type.CPP was way more difficult)

Looks like there is an error in MessageTypes.cpp. case 'Q': shares.push_back(get4bytes(&buf[11])); Just this shares no is 8 bytes. all others seem to be 4 byte.

Mainly I see three differences in 4.1. No V W K J messages in 4.1 (deleted in spec.h) No Locate and no tracking through the data (deleted in message type.cpp) timestamp's length is 4, so I did get4bytes (changed to get4bytes)

I want to test this, see if this works. What do I do now in R? Let's say I want to name this RITCH4. This would be first time I build the library, so if you could work it though with some details, that would be great. Which files do I bring to R? Should I use R studio? (Just remind you that it's been 2 days I used R)

macdeutsche commented 5 years ago

Hi, David

I compiled it and it loaded well. But, I tested with the itch 4 files, and it gives the following error.

install.packages("devtools")

devtools::install() library(RITCH) file <-"~/Desktop/ITCHold/S101209-v4.txt.gz" msg_count <- count_messages(file, add_meta_data = T)

Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'getMessageCountDF' not found

traceback() 3: get(name, envir = asNamespace(pkg), inherits = FALSE) 2: RITCH:::getMessageCountDF 1: count_messages(file, add_meta_data = T)

I modified the following files. countMessages.cpp (to delete V W K J messages that 4.1 doesn't have) Specifications.h (based on your suggestions) MessageTypes.cpp (deleted Locate and tracking and all the other changes per ITCH 4 spec, as you suggested) get_meta_data.R (delete V W K J messages)

I uploaded the entire files that I used to compile in github. Would you be able to check it out?

The ITCH 4 file that I am trying to read is this.

ftp://emi.nasdaq.com/Test/Hold/S101209-v4.txt.gz

Thanks so much

DavZim commented 5 years ago

Are you sure you have built the package? I can build it using your code and it tries to load the messages without the error. However, there are unknown message types, make sure that you have at least the length of each message type correctly specified.

macdeutsche commented 5 years ago

Ok then I may have typos or I didn’t consider all the message. So in your end, it builds, load in library and read the files? And show order and trade?

I can build, load in library and then when it reads the files it gives me the error.

Thanks for help.

Sent from my iPhone

On Apr 17, 2019, at 5:26 PM, DavZim notifications@github.com wrote:

Are you sure you have built the package? I can build it using your code and it tries to load the messages without the error. However, there are unknown message types, make sure that you have at least the length of each message type correctly specified.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

macdeutsche commented 5 years ago

It works now. It turns out the file from nasdaq ftp is 4.0... and 4.1 and 4.0 are quite different.....