Closed msherman13 closed 6 years ago
hello, been using the simplefix library and came across a bug in parsing raw-data tags. hope you have a little time to take a look at the fix. thanks!
Looks like this branch is failing the unit tests: https://travis-ci.org/da4089/simplefix/jobs/314409496
I'll have a look.
@msherman13, I've had a look at the parsing of raw data values, and it looks to me like the current code is doing the right thing: if the parser is in_tag
and hits an equals sign, it gets the tag number and checks the raw data tags list. If more data from the socket is required, it quits, but otherwise it immediately grabs the value and saves it.
It doesn't set in_tag = False
, because it has advanced the parser to the start of the next field (and tag) in line 153 where the point + raw_len + 1
advances over the value and the SOH.
Have a look at the two relevant unit tests:test_raw_data()
and test_raw_data_tags()
. In both these cases, the message has a Checksum (10) tag following the raw data field, and they appear to parse correctly (and, to that point, both those tests fail with your patch).
If you could post the snippet of your code that is failing, and the relevant message data, I'd be happy to take a look at that ... perhaps something in your data is causing the parser to fail?
@da4089 I see what you're saying. I'll dig up the input buffer that was causing my application to error tonight and try to reproduce. Thanks for your comments :)
@msherman13 -- any luck with the input buffer that was causing the problem?
Hi David,
Sorry for the delay. Yes I was able to reproduce this again and it seems that my initial description was not correct. The issue is actually caused specifically by a raw-data field which actually has an "=" character within the body of the value. See below my script which reproduces the exception.
-Miles
#!/usr/bin/env python
import simplefix
SOH = "\x01"
buff = "8=FIX.4.2" + SOH + "9=169" + SOH + "35=A" + SOH + "52=20171213-01:41:08.063" + SOH + "49=HelloWorld" + SOH + "56=1234" + SOH + "34=1" + SOH + "96=ABC=DE" + SOH + "98=0" + SOH + "108=30" + SOH + "554=HelloWorld" + SOH + "8013=Y" + SOH + "10=166" + SOH
parser = simplefix.FixParser()
parser.append_buffer(buff)
msg = parser.get_message()
@msherman13 -- thanks for that sample, it's really useful.
The problem with parsing that is that RawData (96) fields require the previous field to be a RawDataLength (95): unless the length field is present, there's no way for the parser to determine how long the raw field's value will be (it's legal for raw data to include SOH bytes).
Are you able to fix the sending application to send the 95=
unfortunately the data sourcer is an external party. forgive me if i'm not totally familiar with the fix spec, but why do we need the data length to parse this field? can't we just scan the value until we hit an SOH character?
In the FIX protocol, each field has a tag and a value, separated by '=' and terminated by SOH. Except for "data" fields.
Data fields are used to contain values that might need to include an SOH value: things like UTF-8 strings, or some other binary data. They work by using a pair of fields: the first must encode the length of the data for the second. When parsing the second field, instead of scanning ahead for the SOH, you use the length specified in the previous field. There should still be an SOH at the end, but by using the length you can skip over any embedded SOH bytes in the middle.
So, in theory, field tag 96 is RawData, and it must be preceded by field 95, RawDataLen. But ... as you have seen, the theory and the practice are not always consistent in how FIX is actually implemented.
simplefix has a list of the defined data field tags, and their matching length field tags. Where it's getting confused is that it's seeing 96 without a preceding 95.
So ...
I've got a patch that will fix this: unless there's a raw_len > 0
(ie. we've seen the length tag already), the parser will now treat 95 (and any other raw data fields) like a normal field. It's not quite legit according to the letter of the spec, but I think it'll work out ok in practice.
So ... there are two ways around this. The first is that you can tell the parser that you don't want 95/96 to be treated as a data type: parser.remove_raw(95, 96)
. That's why the remove_raw
method exists, but it's a little inconvenient.
But the second way is that I've just pushed a patch that will actually treat what should be data values as strings if we haven't yet seen a length. I think that's the best sort-of "least surprise" approach, although I do have some minor reservations about the magical nature of this approach. I'll see how it works out.
I've also added your test case above to the test suite, btw.
Thank you for the report, and your work to come up with the test case, and hopefully your problem is now solved.
@msherman13, I've just pushed v1.0.8 to PyPI including this patch.
@da4089
I share your reservations about this solution but I’m not sure there is a better way. The remove_raw function gets hairy when there’s no telling what the counter-party could send you.
Thank you for addressing this so promptly, I’m getting good use out of your module :)
I’ll upgrade this weekend and let you know if I have any further issues.
On Dec 15, 2017, at 6:00 AM, David Arnold notifications@github.com wrote:
@msherman13, I've just pushed v1.0.8 to PyPI including this patch.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@da4089
Just confirming the fix worked and my sessions are stable on the new release. Thanks again :)
On Dec 15, 2017, at 7:56 AM, Miles Sherman miles.sherman@me.com wrote:
@da4089
I share your reservations about this solution but I’m not sure there is a better way. The remove_raw function gets hairy when there’s no telling what the counter-party could send you.
Thank you for addressing this so promptly, I’m getting good use out of your module :)
I’ll upgrade this weekend and let you know if I have any further issues.
On Dec 15, 2017, at 6:00 AM, David Arnold <notifications@github.com mailto:notifications@github.com> wrote:
@msherman13 https://github.com/msherman13, I've just pushed v1.0.8 to PyPI including this patch.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/da4089/simplefix/pull/11#issuecomment-351977834, or mute the thread https://github.com/notifications/unsubscribe-auth/ADb-tkxV4OK4gQ2i6APcgg3kkyXlPd0lks5tAlFZgaJpZM4Q8ly3.
@msherman13 -- great to hear it's working! Thanks for your help in resolving the issue, and I'm really pleased to hear the module is useful for you.
Cheers!
the in_tag field is not reset to False in the case of a raw_data_tag, causing the next field in the buffer to be parsed incorrectly