latest convert.c still cannot convert options templates

jepries commented 8 years ago

@thorgrin Hi Petr, I've merged your latest devel branch into my fork and started testing it. Your latest convert.c still throws "WARNING: UDP input: Message conversion error; skipping message..."...

I know my convert.c is ugly :), I've copied my version over yours, recompiled against the latest CESNET/devel into my fork along with my convert.c and they process out ok.

When I get a chance (weekend work), I'll try to debug your convert.c and see where it is failing. I like yours better, you fixed all my bad memory access hacks.

thorgrin commented 8 years ago

@jepries Damn, I'm sorry for not doing it right. I was sure that I get the same output from both, but apparently forgot to merge some important part. If you can spot the problem, let me know, I'll try to fix it.

jepries commented 8 years ago

@thorgrin no worries, i'll trace it out and find it :)

jepries commented 8 years ago

@ thorgrin Petr, argh, this was my bad. As I was documenting what I did, I noticed I NEVER CHECKED OUT DEVEL!!!!!! I was running master branch!!!!! ARGH. So sorry for bugging you. I tend to think out loud lol....

jepries commented 8 years ago

ugh. not so fast, the devel branch just seg faulted.

DEBUG: ipfix_message: [0] Processing data
DEBUG: preprocessor: [0] Options template ID 261 already exists; rewriting...  
DEBUG: template manager: [0] Received the same template as last time, not replacing
Segmentation fault (core dumped)

I'm going to replace convert.c and see if I can continue running. That run was going for about 10 minutes (enough time for me to watch it, comment, close, switch back to watch, then blam. :(

jepries commented 8 years ago

@thorgrin update. I've taken my fork jepries/devel branch which I just pulled cesnet/devel into and the code segfaulted after 10 minutes. I replace the convert.c with my convert.c and it is running now for over an hour. I'll let it run for as long as I have disk (overnight) and make sure it continues. There may be an error in the latest convert.c still (not the error I reported first, that way a wetware issue, mine :) ).

jepries commented 8 years ago

@thorgrin update: My convert.c ran all night, the only difference from devel head. I saw this segmentation fault in my early on as well, it turned out to be a byte count issue, the incoming dataset has only 1 options template, but the for loop isn't done counting octets and gets the next value in memory which can be large, and then the code continues processing invalid data (and sometimes reaching beyond memory)... at least that is what I saw. I'll add debug when I get a chance to ferret out if that is the case in the new convert.c, it may take me a little while

thorgrin commented 8 years ago

@jepries Thanks for looking into this. I've restructured the code quite a bit and changed the boundaries of some while cycles. I got the same results from you convert as from the current one, so I believed that it was OK. If you find out why it does not work, I'll be glad to look into it.

jepries commented 8 years ago

@thorgrin Petr, in addition to the coredump (can't test that now, those options templates are not being thrown at me anymore :( ), conversion of timestamps in data is getting borked up pretty good. These incoming data sets should not be getting timestamp conversions:

DEBUG: convert: PRIES: NFv9 Header...
DEBUG: convert: PRIES: byte[0,1,2,3]: 0009 0003
DEBUG: convert: PRIES: byte[4,5,6,7]: 81ba 5956
DEBUG: convert: PRIES: byte[8,9,10,11]: 56de 11ab
DEBUG: convert: PRIES: byte[12,13,14,15]: 0003 f616
DEBUG: convert: PRIES: byte[16,17,18,19]: 0000 0000
DEBUG: convert: PRIES: NETFLOW_V9 Conversion: version=9, count=3, sysUpTime=2176473430, unixSecs=1457394091, pkgSequence=259606, sourceId=0, timeHeader=1455217617570
DEBUG: convert: PRIES: NFv9 Template Set PRIOR to IPFIX Conversion, len=44
DEBUG: convert: PRIES: byte[0,1,2,3]: 0002 0030
DEBUG: convert: PRIES: byte[4,5,6,7]: 0103 000a
DEBUG: convert: PRIES: byte[8,9,10,11]: 0167 0008
DEBUG: convert: PRIES: byte[12,13,14,15]: 9869 0008
DEBUG: convert: PRIES: byte[16,17,18,19]: 986a 0008
DEBUG: convert: PRIES: byte[20,21,22,23]: 986b 0008
DEBUG: convert: PRIES: byte[24,25,26,27]: 986c 0008
DEBUG: convert: PRIES: byte[28,29,30,31]: 90eb 0004
DEBUG: convert: PRIES: byte[32,33,34,35]: 9858 0004
DEBUG: convert: PRIES: byte[36,37,38,39]: 000e 0004
DEBUG: convert: PRIES: byte[40,41,42,43]: 00eb 0004
DEBUG: convert: PRIES: byte[44,45,46,47]: 0053 0008
DEBUG: convert: PRIES: Template Conversion to IPFIX Complete.
DEBUG: convert: PRIES: byte[0,1,2,3]: 0002 0048
DEBUG: convert: PRIES: byte[4,5,6,7]: 0103 000a
DEBUG: convert: PRIES: byte[8,9,10,11]: 0167 0008
DEBUG: convert: PRIES: byte[12,13,14,15]: 9869 0008
DEBUG: convert: PRIES: byte[16,17,18,19]: ffff ffff
DEBUG: convert: PRIES: byte[20,21,22,23]: 986a 0008
DEBUG: convert: PRIES: byte[24,25,26,27]: ffff ffff
DEBUG: convert: PRIES: byte[28,29,30,31]: 986b 0008
DEBUG: convert: PRIES: byte[32,33,34,35]: ffff ffff
DEBUG: convert: PRIES: byte[36,37,38,39]: 986c 0008
DEBUG: convert: PRIES: byte[40,41,42,43]: ffff ffff
DEBUG: convert: PRIES: byte[44,45,46,47]: 90eb 0004
DEBUG: convert: PRIES: byte[48,49,50,51]: ffff ffff
DEBUG: convert: PRIES: byte[52,53,54,55]: 9858 0004
DEBUG: convert: PRIES: byte[56,57,58,59]: ffff ffff
DEBUG: convert: PRIES: byte[60,61,62,63]: 000e 0004
DEBUG: convert: PRIES: byte[64,65,66,67]: 00eb 0004
DEBUG: convert: PRIES: byte[68,69,70,71]: 0053 0008

The above is the template set conversion and it looks perfect to me. Here is the conversion of the next two incoming data sets:

DEBUG: convert: PRIES: Incoming NFv9 Data Set Packet before Conversion, len=128, id=3
DEBUG: convert: PRIES: byte[0,1,2,3]: 0103 0084
DEBUG: convert: PRIES: byte[4,5,6,7]: 0000 0153
DEBUG: convert: PRIES: byte[8,9,10,11]: 51e9 146b
DEBUG: convert: PRIES: byte[12,13,14,15]: 0000 0000
DEBUG: convert: PRIES: byte[16,17,18,19]: 0000 0013
DEBUG: convert: PRIES: byte[20,21,22,23]: 0000 0000
DEBUG: convert: PRIES: byte[24,25,26,27]: 0000 03e8
DEBUG: convert: PRIES: byte[28,29,30,31]: 0000 0000
DEBUG: convert: PRIES: byte[32,33,34,35]: 0000 0012
DEBUG: convert: PRIES: byte[36,37,38,39]: 0000 0000
DEBUG: convert: PRIES: byte[40,41,42,43]: 0000 03e8
DEBUG: convert: PRIES: byte[44,45,46,47]: 0202 0202
DEBUG: convert: PRIES: byte[48,49,50,51]: 0404 0404
DEBUG: convert: PRIES: byte[52,53,54,55]: 0000 0008
DEBUG: convert: PRIES: byte[56,57,58,59]: 0000 0000
DEBUG: convert: PRIES: byte[60,61,62,63]: 496e 6574
DEBUG: convert: PRIES: byte[64,65,66,67]: 0000 0000
DEBUG: convert: PRIES: byte[68,69,70,71]: 0000 0153
DEBUG: convert: PRIES: byte[72,73,74,75]: 51e9 2fb8
DEBUG: convert: PRIES: byte[76,77,78,79]: 0000 0000
DEBUG: convert: PRIES: byte[80,81,82,83]: 0000 0014
DEBUG: convert: PRIES: byte[84,85,86,87]: 0000 0000
DEBUG: convert: PRIES: byte[88,89,90,91]: 0000 03e8
DEBUG: convert: PRIES: byte[92,93,94,95]: 0000 0000
DEBUG: convert: PRIES: byte[96,97,98,99]: 0000 0012
DEBUG: convert: PRIES: byte[100,101,102,103]: 0000 0000
DEBUG: convert: PRIES: byte[104,105,106,107]: 0000 03e8
DEBUG: convert: PRIES: byte[108,109,110,111]: 0202 0202
DEBUG: convert: PRIES: byte[112,113,114,115]: 0303 0303
DEBUG: convert: PRIES: byte[116,117,118,119]: 0000 0008
DEBUG: convert: PRIES: byte[120,121,122,123]: 0000 0000
DEBUG: convert: PRIES: byte[124,125,126,127]: 4d50 4c53
DEBUG: convert: PRIES: byte[128,129,130,131]: 0000 0000
DEBUG: convert: PRIES: Timestamp conversion complete, Length shifted by 16 bytes
DEBUG: convert: PRIES: Padding addition (if necessary) completed
DEBUG: convert: PRIES: Conversion to IPFIX Complete.
DEBUG: convert: PRIES: byte[0,1,2,3]: 0103 0094
DEBUG: convert: PRIES: byte[4,5,6,7]: 0000 0153
DEBUG: convert: PRIES: byte[8,9,10,11]: 51e9 1400
DEBUG: convert: PRIES: byte[12,13,14,15]: 0001 5325
DEBUG: convert: PRIES: byte[16,17,18,19]: 0c93 b66b
DEBUG: convert: PRIES: byte[20,21,22,23]: 0000 0000
DEBUG: convert: PRIES: byte[24,25,26,27]: 0000 0013
DEBUG: convert: PRIES: byte[28,29,30,31]: 0000 0000
DEBUG: convert: PRIES: byte[32,33,34,35]: 0000 03e8
DEBUG: convert: PRIES: byte[36,37,38,39]: 0000 0000
DEBUG: convert: PRIES: byte[40,41,42,43]: 0000 0012
DEBUG: convert: PRIES: byte[44,45,46,47]: 0000 0000
DEBUG: convert: PRIES: byte[48,49,50,51]: 0000 03e8
DEBUG: convert: PRIES: byte[52,53,54,55]: 0202 0202
DEBUG: convert: PRIES: byte[56,57,58,59]: 0404 0404
DEBUG: convert: PRIES: byte[60,61,62,63]: 0000 0008
DEBUG: convert: PRIES: byte[64,65,66,67]: 0000 0000
DEBUG: convert: PRIES: byte[68,69,70,71]: 496e 6574
DEBUG: convert: PRIES: byte[72,73,74,75]: 0000 0000
DEBUG: convert: PRIES: byte[76,77,78,79]: 0000 0153
DEBUG: convert: PRIES: byte[80,81,82,83]: 51e9 2f00
DEBUG: convert: PRIES: byte[84,85,86,87]: 0001 5325
DEBUG: convert: PRIES: byte[88,89,90,91]: 0c93 d1b8
DEBUG: convert: PRIES: byte[92,93,94,95]: 0000 0000
DEBUG: convert: PRIES: byte[96,97,98,99]: 0000 0014
DEBUG: convert: PRIES: byte[100,101,102,103]: 0000 0000
DEBUG: convert: PRIES: byte[104,105,106,107]: 0000 03e8
DEBUG: convert: PRIES: byte[108,109,110,111]: 0000 0000
DEBUG: convert: PRIES: byte[112,113,114,115]: 0000 0012
DEBUG: convert: PRIES: byte[116,117,118,119]: 0000 0000
DEBUG: convert: PRIES: byte[120,121,122,123]: 0000 03e8
DEBUG: convert: PRIES: byte[124,125,126,127]: 0202 0202
DEBUG: convert: PRIES: byte[128,129,130,131]: 0303 0303
DEBUG: convert: PRIES: byte[132,133,134,135]: 0000 0008
DEBUG: convert: PRIES: byte[136,137,138,139]: 0000 0000
DEBUG: convert: PRIES: byte[140,141,142,143]: 4d50 4c53
DEBUG: convert: PRIES: byte[144,145,146,147]: 0000 0000
DEBUG: ipfix_message: [0] Processing data
INFO: preprocessor: [0] New template ID 259

ARGH, there should not be a shift. The shift occurs here:

259_ipfixconversion

So added more debug around the problem, in the insert_timestamp_data method, here is what is going on:

....
    uint32_t lenIndex = templates.cols * id;
    uint32_t posIndex = lenIndex + 1;

    /* Input check: avoid overflow due to malicious flowset IDs */
    if (lenIndex > templates.max * templates.cols) {
            MSG_DEBUG(msg_module, "PRIES: Avoiding overflow due to malicious flowset id: lenIndex=%u, templates.max=%u, templates.cols=%u", lenIndex, templates.max, templates.cols);
            return 0;
    }

    /* Get number of data records using the same template */
    if (templates.templ[lenIndex] <= 0) {
            MSG_DEBUG(msg_module, "PRIES: No records using this template, templates.templ[%u]=%u",lenIndex,templates.templ[lenIndex] );
            return 0;
    }
    num = len / templates.templ[lenIndex];
    MSG_DEBUG(msg_module,"PRIES: Number of datasets num=%u, len=%u, templates.templ[%u]=%u", num, len, templates.templ[lenIndex]);
    if (num == 0) {
            MSG_DEBUG(msg_module, "PRIES: skipping timestamp insert, len=%u, num=%u", len, len/templates.templ[lenIndex] );
            return 0;
    }

    /* Increase sequence number */
    ipfix_seq_no[NF9_SEQ_NO] += num;

    /* Iterate through all data records */
    shifted = 0;
    first_offset = templates.templ[posIndex];
    last_offset = first_offset + BYTES_4;

    MSG_DEBUG(msg_module, "PRIES: timestamp_data - id=%u, lenIndex=%u, posIndex=%u, first_offset=%u, last_offset=%u, num=%u", id, lenIndex, posIndex, first_offset, last_offset, num);

    for (i = num - 1; i >= 0; i--) {
            /* Resize each timestamp in each data record to 64 bit */
            pkt = (uint8_t *) dataSet + BYTES_4 + (i * templates.templ[lenIndex]);
            uint64_t first = ntohl(*((uint32_t *) (pkt + first_offset)));
            uint64_t last  = ntohl(*((uint32_t *) (pkt + last_offset)));
....

Here is the debug output from that MSG_DEBUG statement:

DEBUG: convert: PRIES: Number of datasets num=2, len=128, templates.templ[64]=131
DEBUG: convert: PRIES: timestamp_data - id=3, lenIndex=6, posIndex=7, first_offset=65535, last_offset=3, num=2
DEBUG: convert: PRIES: Timestamp conversion complete, Length shifted by 16 bytes

The insert_timestamp_template never finds a timestamp to convert so it leaves first_offset at -1, but since there are two datasets (num=2) to process with this template, somehow the the 16 bytes are being added right after the first element.

I don't know how to fix this yet, but I think if first_offset > max template size (have to search out this variable) then skip the for loop replacing timestamps that don't exist to replace. -or- convert first_offset to what it is in insert_timestamp_template (probably signed) and if it is <0, skip, I think that is a better idea. I'll dig it out tomorrow, I'm spent for today.

update: I wouldn't have been able to sleep tonight without fixing this. Here is what I did, just before the for loop in insert_timestamp_data

...
    MSG_DEBUG(msg_module, "PRIES: timestamp_data - id=%u, lenIndex=%u, posIndex=%u, first_offset=%u, last_offset=%u, num=%u", id, lenIndex, posIndex, first_offset, last_offset, num);
    //TODO: SKIP THIS FOR LOOP IF first_offset < 0 (it is -1!!!!!! which in unsigned is 65535
    if (((int16_t) first_offset) < 0)
    {
            MSG_DEBUG(msg_module, "PRIES: Skipping timestamp insert, no fields found to change, first_offset=%u", first_offset);
            return 0;
    }

    for (i = num - 1; i >= 0; i--) {
            /* Resize each timestamp in each data record to 64 bit */
            pkt = (uint8_t *) dataSet + BYTES_4 + (i * templates.templ[lenIndex]);
....

thorgrin commented 8 years ago

@jepries Sorry it took so long. I've just made a commit that should help. It's really a small change, but explains a lot ;-) Please give it a try. It does not throw any memory issues anymore, at least on that sample that you gave and also on my own flow exporter.

And thanks for that detailed analysis, it was a big help.

CESNET / ipfixcol

latest convert.c still cannot convert options templates #160