Azure / azure-diagnostics-tools

Plugins and tools for collecting, processing, managing, and visualizing diagnostics data and configuration
98 stars 92 forks source link

logstash-input-azureblob doesn't send file or path information #103

Open jscotthead opened 7 years ago

jscotthead commented 7 years ago

There is no way to retrieve the file or path information in logstash from the azureblob input plugin. All metadata information that can be found in the path is lost and either has to be reproduced in the file itself in order to be ingested. Is there a way to add this information and make it accessible within the logstash pipeline?

brahmnes commented 7 years ago

Can you give an example on the path information? Are you referring to the path as it sits in Azure blob, or the original path when the file sits on the file system from the originating VM?

Assuming it's the path to the Azure blob, why do you need it further down the pipeline?

xiaomi7732 commented 7 years ago

@jscotthead, Could you please help us understanding better the issue by answering the questions by @brahmnes ?

jscotthead commented 7 years ago

Hey .. sorry. I just want to know the 'path' to the file in Azure so that I can know the file name as part of the metadata for the log messages in Elasticsearch.

Example of path information:

  1. Storage Account Name
  2. Container
  3. Full path to the file inside the container

I realize that the Storage Account and Container info may not be accessible. But the full path would help nonetheless. metadata: { sa-name: "my-storage-account", container: "temp-container", file-path: "folder1/folder2/filename.txt" }

xiaomi7732 commented 7 years ago

@jscotthead, Thanks for the quick turn-around. This sounds like a feature request to the plugin. Before we dive in, could you please also elaborate why do you want to know where (which azure blob) the logs‘ coming from?

I have no idea how much time will I have to implement this but I think, technically, we can inject all 3 properties into the event object. It is also fair for the azure blob input plugin to provide azure blob metadata.

Feasibility: From the input end: Storage Account Name and Container can be get from the configuration. For a specific event, the source file path is given as well in the loop of all blobs.

From the output end, we can inject custom-property like it below:

        @codec.decode(data) do |event|
          decorate(event)
          event.set("custom-property", some-value) if !event.include?("custom-property")
          queue << event
        end

Configuration change Add a parameter of 'inject-azure-storage-metadata' of hash. The will include 3 boolean properties:

So that the configuration will look like:

input
{
    azureblob
    {
        storage_account_name => "mystorageaccount"
        container => "mycontainer"
        ...
        inject-azure-storage-metadata => { :storage-account-name => true, :container-name => false, :blob-name => true }
    }
}

The default value for the properties are false that will not inject any properties, that will keep the behavior consistent with the current implementation.

Thinking?

AshKapow commented 7 years ago

I'd also like to request this

My issue is that I don't think all of the files in the blob are being picked up, so these added fields would be an easy way to diagnose the issue.

brahmnes commented 6 years ago

Hi @AshKapow

Sounds like this is not a difficult change. Can you submit a PR for it? This can be an additional config switch.

MrJerB commented 6 years ago

PR #174 tackles this and got merged