DavZim / RITCH

An R interface to the ITCH Protocol
https://davzim.github.io/RITCH/
Other
18 stars 5 forks source link

Parsing ITCH version 4.1 #2

Closed vovalev closed 6 years ago

vovalev commented 6 years ago

Hi David!

I am really impressed by your efforts in creating this library. It really helps me to convert the raw itch files to readable ones. Now Nasdaq provides with SMMDDYY-v#.txt.gz files but still for version ITCH_5.0 everything works fine (except the fact that I cannot read messages which came after 10:30 am, local time)

The question (or issue) is however about parsing ITCH files of version 4.1. Is it possible to extend the library? or maybe just small changes need to be implemented?

Thanks in advance, Vladimir

DavZim commented 6 years ago

Hi Vladimir, glad that you like it/use it.

Do I understand you correctly: you try to parse an ITCH 5.0 file and it doesn't read messages after 10:30? What exactly do you mean by local time?

With regards to ITCH 4.1. I am not really sure how hard it will be. If the messages are built in the same way as in ITCH 5.0 it should be ok. Do you have the spec file for 4.1? Then I could look into it.

Best, David

vovalev commented 6 years ago

Hi, David!

Thanks for a quick response! Here are some details: I try to parse a file of the type SMMDDYY-v#.txt.gz which I downloaded from Nasdaq FTP server and the last order message is timestamped by something 10:30 local time (means EDT). I think however there is something wrong with the file itself or with the way I downloaded it.

The main question was about parsing other versions of ITCH files, e.g. v4.1. For this version, the specification could be found at http://www.nasdaqtrader.com/content/technicalsupport/specifications/dataproducts/NQTV-ITCH-V4_1.pdf

Would be very nice if you can add this to the library or at least teach me a little bit how to do it, so I will be able to do it myself.

Thanks! Vladimir

DavZim commented 6 years ago

It looks like NASDAQ has changed the overall structure of the interface. That would mean someone has to write the parser for each message-type (see for example: Specifcations.h and MessageTypes.h.

There is also the problem of how to recognize which format is being used.

Unfortunately I don't have the time to implement that currently. You are more than welcome to open a pull request though.

Should you still encounter the error you mentioned earlier, please let me know and I will try to look into it.

vovalev commented 6 years ago

Do you mean that in order to parse these files it is enough to only adjust Specifications.h and MessageTypes.h according to the description of messages in that pdf?

DavZim commented 6 years ago

What you would have to to is:

  1. update the message lengths in Specifications.h
  2. update the XXX::loadMessages() functions for XXX as Orders, Trades, Modifications (you may need to adjust the iterator (i.e., buf[0] etc) to point to the right message part).

If you want to add more information from a message, you have to change the specific XXX::loadMessages, XXX::getDF

That should do the job

vovalev commented 5 years ago

Hi David!

I downloaded your files and managed to create a library in R which is able to parse version 4.1 as well! It is very clumsy but it works.

Btw: In get_modifications.R you assign NA to the variable of new_order_ref for the message type U, which is incorrect because when the order is modified it receives the new reference number.

DavZim commented 5 years ago

Thanks for the notification, fixed in 9bfaddf19e20299b8ce7b7d3235d67f38f1d5e2f

If you find a way to distinguish version 4.1 and version 5.1, you are more than welcome to open a pull request!