Azure / azure-diagnostics-tools

Plugins and tools for collecting, processing, managing, and visualizing diagnostics data and configuration
98 stars 92 forks source link

logstash-input-azureblob - JSON parse error #115

Closed inputfalken closed 7 years ago

inputfalken commented 7 years ago

Hi, i am having issues when reading data from blob.

The data comes in the following structure but on a single line:

[
    {
        "prop1": "value1",
        "prop2": "value2",
        "prop3": "value3"
    },
    {
        "prop1": "value1",
        "prop2": "value2",
        "prop3": "value3"
    }
]

I get the following error when using the json codec: JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected end-of-input: expected close marker for Array.

xiaomi7732 commented 7 years ago

Hi @inputfalken, What is your codec for your input in the configuration?

inputfalken commented 7 years ago

json, i have tried with json_lines as well but json is the code i have gotten furthest with. Since i dont get any logs when debugging the other codecs available.

xiaomi7732 commented 7 years ago

@inputfalken Thanks for the quick turn-around.

It is not sure whether your data is a complete json or line by line json. If it is line by line json, json_lines is the codec to use. (Reference: https://www.elastic.co/guide/en/logstash/current/codec-plugins.html). Otherwise, please follow the readme here: https://github.com/Azure/azure-diagnostics-tools/blob/master/Logstash/logstash-input-azureblob/README.md to set the proper value for file_head_bytes and file_tail_bytes.

If you still need help, please post:

  1. a sample of your data;
  2. a complete config file; (Please remove the credentials, though. :-))

Thanks, Saar

inputfalken commented 7 years ago

The data comes from a Continues Export on mobile center (App Center).

Maybe you're familiar with how that data looks? Otherwise i can post a sample of the data as well as the config later.

xiaomi7732 commented 7 years ago

@inputfalken , Please post a sample of the data that I can take a look. Thanks! Oh, and the configure file as well.

inputfalken commented 7 years ago

Prettified data sample from a single line:

[
    {
        "AppBuild": "162",
        "AppId": "fe6d4db4-fa4b-45e7-93c9-f5cdb4776a35",
        "AppNamespace": "cooper",
        "AppVersion": "1.0",
        "CarrierCountry": "se",
        "CarrierName": "carrier",
        "CorrelationId": "b380e22e-86b9-4888-9c0e-4ce6f9c57fed",
        "CountryCode": "se",
        "EventId": "",
        "EventName": "",
        "IngressTimestamp": "2017-11-13T09:28:42.511Z",
        "InstallId": "549bd4d2-63bc-4b77-b034-2eee52ff0f10",
        "IsTestMessage": "False",
        "LiveUpdateDeploymentKey": "None",
        "LiveUpdatePackageHash": "None",
        "LiveUpdateReleaseLabel": "None",
        "Locale": "en_SE",
        "MessageId": "79c574f9-b56b-45ef-8ex2-0b0b602f581d",
        "MessageType": "StartSessionLog",
        "Model": "iPhone7",
        "OemName": "Apple",
        "OsApiLevel": "None",
        "OsBuild": "15B93",
        "OsName": "iOS",
        "OsVersion": "11.1",
        "Properties": "",
        "ScreenSize": "2208x1242",
        "SdkName": "mobilecenter.ios",
        "SdkVersion": "0.14.1",
        "SessionId": "fld5k036-13od-4d54-b65b-3afc7a9569b4",
        "TimeZoneOffset": "PT1H",
        "Timestamp": "2017-11-13T09:28:38.78Z",
        "WrapperRuntimeVersion": "11.2.0",
        "WrapperSdkName": "mobilecenter.xamarin",
        "WrapperSdkVersion": "0.17.1"
    }
]

Config:

input {
  azureblob {
    container => "mobile"
    storage_access_key => ""
    storage_account_name => "name"
    codec => json
    registry_create_policy => start_over
    type => "mobileapp"
  }
}

output {
  elasticsearch {
    index => "%{type}-%{+YYYY.MM.dd}"
    hosts => [ "" ]
  }
}
xiaomi7732 commented 7 years ago

@inputfalken, Based on the info above, I believe this is the same pattern as NSG logs used. json codec is the right codec, but you will have to figure out the header bytes and the tail bytes.

Take the example

[
    {
        "prop1": "value1",
        "prop2": "value2",
        "prop3": "value3"
    },
    {
        "prop1": "value1",
        "prop2": "value2",
        "prop3": "value3"
    }
]

You have 6 bytes on top of the log ([, newline and 4 spaces) before the first json object; set the 'file_head_bytes' to 6. Count the tail as well to set the file_tail_bytes - the value could be different because you I guess there's some extra indentation to pretty the json.

For more details, reference:

Let me know if you need more info.

inputfalken commented 7 years ago

Hey, thanks for the Info. Is this needed even when the data is originally from a single line?

xiaomi7732 commented 7 years ago

@inputfalken, Sorry for the delay. Yes. Single line or not is not the key, the key is how the json file got increased;

Let me throw in some details here: For example, when one record becomes two, for line by line increasing (json_line codec):

{"record": "1"} // new line as separator here.
{"record": "2"} // appending a new line

This is very straight forward. Just every line is a well formatted json. (Well, the whole file is not.)

For json codec it looks like it starts with

[{"record": "1"}]

and become:

[{"record": "1"},
{"record": "2"}]

In this case, the whole file is a complete json, but it takes more effort the plugin to understand the delta, it needs info to process the head ([) and the tail (]).

I wish this answers your question.

Thanks!

inputfalken commented 7 years ago

Thanks for the info, i will try it out! 👍

xiaomi7732 commented 7 years ago

Sure. Give it a try and feel free to re-open this issue when you need more help. Thanks!