parser: fix a bug in parsing raw-data tag/value pairs

msherman13 commented 6 years ago

the in_tag field is not reset to False in the case of a raw_data_tag, causing the next field in the buffer to be parsed incorrectly

msherman13 commented 6 years ago

hello, been using the simplefix library and came across a bug in parsing raw-data tags. hope you have a little time to take a look at the fix. thanks!

da4089 commented 6 years ago

Looks like this branch is failing the unit tests: https://travis-ci.org/da4089/simplefix/jobs/314409496

I'll have a look.

da4089 commented 6 years ago

@msherman13, I've had a look at the parsing of raw data values, and it looks to me like the current code is doing the right thing: if the parser is in_tag and hits an equals sign, it gets the tag number and checks the raw data tags list. If more data from the socket is required, it quits, but otherwise it immediately grabs the value and saves it.

It doesn't set in_tag = False, because it has advanced the parser to the start of the next field (and tag) in line 153 where the point + raw_len + 1 advances over the value and the SOH.

Have a look at the two relevant unit tests:test_raw_data() and test_raw_data_tags(). In both these cases, the message has a Checksum (10) tag following the raw data field, and they appear to parse correctly (and, to that point, both those tests fail with your patch).

If you could post the snippet of your code that is failing, and the relevant message data, I'd be happy to take a look at that ... perhaps something in your data is causing the parser to fail?

msherman13 commented 6 years ago

@da4089 I see what you're saying. I'll dig up the input buffer that was causing my application to error tonight and try to reproduce. Thanks for your comments :)

da4089 commented 6 years ago

@msherman13 -- any luck with the input buffer that was causing the problem?

msherman13 commented 6 years ago

Hi David,

Sorry for the delay. Yes I was able to reproduce this again and it seems that my initial description was not correct. The issue is actually caused specifically by a raw-data field which actually has an "=" character within the body of the value. See below my script which reproduces the exception.

-Miles

#!/usr/bin/env python

import simplefix

SOH = "\x01"

buff = "8=FIX.4.2"  +  SOH + "9=169" + SOH + "35=A" + SOH + "52=20171213-01:41:08.063" + SOH + "49=HelloWorld" + SOH + "56=1234" + SOH + "34=1" + SOH + "96=ABC=DE" + SOH + "98=0" + SOH + "108=30" + SOH + "554=HelloWorld" + SOH + "8013=Y" + SOH + "10=166" + SOH

parser = simplefix.FixParser()

parser.append_buffer(buff)

msg = parser.get_message()

da4089 commented 6 years ago

@msherman13 -- thanks for that sample, it's really useful.

The problem with parsing that is that RawData (96) fields require the previous field to be a RawDataLength (95): unless the length field is present, there's no way for the parser to determine how long the raw field's value will be (it's legal for raw data to include SOH bytes).

Are you able to fix the sending application to send the 95= field before the 96? Or is this an external counter-party?

msherman13 commented 6 years ago

unfortunately the data sourcer is an external party. forgive me if i'm not totally familiar with the fix spec, but why do we need the data length to parse this field? can't we just scan the value until we hit an SOH character?

da4089 commented 6 years ago

In the FIX protocol, each field has a tag and a value, separated by '=' and terminated by SOH. Except for "data" fields.

Data fields are used to contain values that might need to include an SOH value: things like UTF-8 strings, or some other binary data. They work by using a pair of fields: the first must encode the length of the data for the second. When parsing the second field, instead of scanning ahead for the SOH, you use the length specified in the previous field. There should still be an SOH at the end, but by using the length you can skip over any embedded SOH bytes in the middle.

So, in theory, field tag 96 is RawData, and it must be preceded by field 95, RawDataLen. But ... as you have seen, the theory and the practice are not always consistent in how FIX is actually implemented.

simplefix has a list of the defined data field tags, and their matching length field tags. Where it's getting confused is that it's seeing 96 without a preceding 95.

So ...

I've got a patch that will fix this: unless there's a raw_len > 0 (ie. we've seen the length tag already), the parser will now treat 95 (and any other raw data fields) like a normal field. It's not quite legit according to the letter of the spec, but I think it'll work out ok in practice.

da4089 commented 6 years ago

So ... there are two ways around this. The first is that you can tell the parser that you don't want 95/96 to be treated as a data type: parser.remove_raw(95, 96). That's why the remove_raw method exists, but it's a little inconvenient.

But the second way is that I've just pushed a patch that will actually treat what should be data values as strings if we haven't yet seen a length. I think that's the best sort-of "least surprise" approach, although I do have some minor reservations about the magical nature of this approach. I'll see how it works out.

I've also added your test case above to the test suite, btw.

Thank you for the report, and your work to come up with the test case, and hopefully your problem is now solved.

da4089 commented 6 years ago

@msherman13, I've just pushed v1.0.8 to PyPI including this patch.

msherman13 commented 6 years ago

@da4089

I share your reservations about this solution but I’m not sure there is a better way. The remove_raw function gets hairy when there’s no telling what the counter-party could send you.

Thank you for addressing this so promptly, I’m getting good use out of your module :)

I’ll upgrade this weekend and let you know if I have any further issues.

On Dec 15, 2017, at 6:00 AM, David Arnold notifications@github.com wrote:

@msherman13, I've just pushed v1.0.8 to PyPI including this patch.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

msherman13 commented 6 years ago

@da4089

Just confirming the fix worked and my sessions are stable on the new release. Thanks again :)

On Dec 15, 2017, at 7:56 AM, Miles Sherman miles.sherman@me.com wrote:

@da4089

I share your reservations about this solution but I’m not sure there is a better way. The remove_raw function gets hairy when there’s no telling what the counter-party could send you.

Thank you for addressing this so promptly, I’m getting good use out of your module :)

I’ll upgrade this weekend and let you know if I have any further issues.

On Dec 15, 2017, at 6:00 AM, David Arnold <notifications@github.com mailto:notifications@github.com> wrote:

@msherman13 https://github.com/msherman13, I've just pushed v1.0.8 to PyPI including this patch.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/da4089/simplefix/pull/11#issuecomment-351977834, or mute the thread https://github.com/notifications/unsubscribe-auth/ADb-tkxV4OK4gQ2i6APcgg3kkyXlPd0lks5tAlFZgaJpZM4Q8ly3.

da4089 commented 6 years ago

@msherman13 -- great to hear it's working! Thanks for your help in resolving the issue, and I'm really pleased to hear the module is useful for you.

Cheers!

da4089 / simplefix

parser: fix a bug in parsing raw-data tag/value pairs #11