fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.82k stars 1.34k forks source link

CSV inputs with headers #915

Open kakoni opened 8 years ago

kakoni commented 8 years ago

I guess currently in_tail/csv parser don't support csv files with headers? Is this planned?

repeatedly commented 8 years ago

Is this planned?

No plan for now. Currently, Fluentd's parsers don't have re-initialize configuration mechanizm. So if we need to support such metadata handling feature, we should re-design parser APIs.

kakoni commented 8 years ago

Ok. Thanks

repeatedly commented 8 years ago

@kakoni Hard to set keys parameter in the configuration?

kakoni commented 8 years ago

@repeatedly No but then I need to filter out headers rows. I wrote my own csvparser where I do something like;

      def parse(text)
        row = CSV.parse_line(text, col_sep: @delimiter)
        if @keys.empty?
          @keys = row
        elsif (@keys - row).empty?
          return
        else
          yield values_map(row)
        end
      end

Obviously this assumes that you read_from_head + input files are "immutable" that is they are written only once, no appends..

repeatedly commented 8 years ago

I see. We will consider it but we need more time to re-design Parser API because in_tail shares parser instance between target files. So using your own parser is better for now.

tagomoris commented 8 years ago

In v0.14 parser API design, <parse> section can get arguments for many purposes. For example, it can be used for patterns of filename.

@type tail
path /my/dir/*.csv
<parse> # default pattern
  @type csv
</parse>
<parse myfile.with.header.*.csv>
  @type csv
  csv_with_header true
</parse>

Sharing parsers for all files is from design of in_tail plugin, not parsers.

Ninir commented 8 years ago

@kakoni could you share the whole file (parser) and how to implement it please? @tagomoris is csv_with_header already implemented? can't find anything... :(

tagomoris commented 8 years ago

I showed just API capability, but it's not implemented yet.

kakoni commented 8 years ago

@Ninir Heres an example https://gist.github.com/kakoni/b0ef238e630e65e860c83bfe55ffb53a

But obviously this would only work if you always read_from_head (which is exactly the case in my situation)

Ninir commented 8 years ago

@tagomoris got it! @kakoni Thank you good sir, perfect :)

John-Lin commented 6 years ago

Hi, I'm using fluentd version 0.14.23 and I want to parse csv with header. I found this issue is opened for a long time. The csv2 plugin seems not working due to fluentd upgrade and I modify the csv2 plugin source to

require 'fluent/plugin/parser'

require 'csv'

module Fluent
  module Plugin
    class CSV2Parser < Parser
      Plugin.register_parser('csv2', self)

      config_param :keys, :array, value_type: :string
      config_param :delimiter, :string, default: ','

      def parse(text, &block)
        values = CSV.parse_line(text, col_sep: @delimiter)
        if @keys.empty?
          @keys = values
        elsif (@keys - values).empty?
          return
        else
          r = Hash[@keys.zip(values)]
          time, record = convert_values(parse_time(r), r)
          yield time, record
        end
      end
    end
  end
end

it works! The example configuration is like below:

<source>
  @type tail
  path /my/path/to/csv
  tag hello
  format csv2
  keys 
</source>

<match **>
  @type stdout
</match>

note: keys should keep in configuration and the coming parameters should leave empty.

amitdhawan commented 5 years ago

Is this supported now?