fluent / fluent-plugin-webhdfs

Hadoop WebHDFS output plugin for Fluentd
http://docs.fluentd.org/articles/out_webhdfs
Other
58 stars 34 forks source link

httpFS - Do not create file if it does not exist #46

Open hurtauda opened 7 years ago

hurtauda commented 7 years ago

Hello,

We are running a MapR custer and webHDFS is not supported by MapR. So we are trying to populate hadoop using httpFS.

Our Webhdfs config :

  @type webhdfs
  host mapr-mapr-master-0
  port 14000
  path "/uhalogs/docker/docker-%M.log"
  time_slice_format %M
  flush_interval 5s
  username mapr
  httpfs true

However when using the fluentd plugin, logs are appended correclty to an existing file. But if the file does not exist (using a timestamp-based filename), we get a WebHDFS::ServerError instead of a WebHDFS::FileNotFoundError that would create the file I guess.

Error 500 received by Mapr :

{
  "RemoteException": {
    "message": "Append failed for file: /uhalogs/docker/testfile.log, error: No such file or directory (2)",
    "exception": "IOException",
    "javaClassName": "java.io.IOException"
  }
}

logs by fluentd-webhdfs plugin :

2017-01-12 13:59:09 +0000 [warn]: failed to communicate hdfs cluster, path: /uhalogs/docker/docker-58.log
2017-01-12 13:59:09 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-01-12 14:00:13 +0000 error_class="WebHDFS::ServerError" error="{\"RemoteException\":{\"message\":\"Append failed for file: \\/uhalogs\\/docker\\/docker-58.log, error: No such file or directory (2)\",\"exception\":\"IOException\",\"javaClassName\":\"java.io.IOException\"}}" plugin_id="object:3fe5f920c960"
2017-01-12 13:59:09 +0000 [warn]: suppressed same stacktrace

related code : https://github.com/fluent/fluent-plugin-webhdfs/blob/master/lib/fluent/plugin/out_webhdfs.rb#L262

What I am not sure and I can't find proper specifications for HttpFS on the web is :

Thank You Alban

enarciso commented 7 years ago

I'm also experiencing this problem but on the cloudera platform. We cannot use webhdfs because it does not have HA capabilities compared to httpfs.

repeatedly commented 7 years ago

Sorry for missing this issue. I'm not familiar with HttpFS but if the WebHDFS and HttpFS are incompatibile in several operations, we should care it.

Is it a bad implementation of httpFS on MapR side

From enarciso comment, it seems HttpFS behaviour is same on several distribution. I'm not sure this is a bug of HttpFS or not. I think append operation should create new file when file doesn't exist.

enarciso commented 7 years ago

My unfortunate workaround at the moment is to constantly monitor the httpfs logs, watch for string like above and run a touchz to create the file. Thank you for look into this @repeatedly

tagomoris commented 7 years ago

WebHDFS::ServerError means that the client (fluentd) receives HTTP response code 500 from HttpFs server. WebHDFS server returns 404 for such cases. IMO it's a bug of HttpFs implementation, because of behavior incompatibility between WebHDFS and HttpFs.

And it (HttpFs) is interoperable with the webhdfs REST HTTP API. https://hadoop.apache.org/docs/r2.8.0/hadoop-hdfs-httpfs/index.html

enarciso commented 7 years ago

Thank you @tagomoris, ive open a case with Cloudera.