Closed cfstout closed 9 years ago
Which versions of AWS EMR are you running on? We run all of our jobs on AWS EMR. Are you reading directly from S3 rather than HDFS?
Yes I'm reading directly from S3. The default is to read from HDFS which would fix the bug. I think I could also configure the job to use S3 as default, but the error message: You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain a file system supporting your path.
seems to indicate that the preferred method is to pass in the uri.
Cool. Please submit a PR and we'll get it rolled in. Thanks.
I've been working on porting some of this code up to AWS's elastic map reduce framework and have found a bug with the way we are setting paths. Instead of calls to
FileSystem.get(job.getConfirguration())
, we should pass the optional URI parameter asFileSystem.get(inputPath.toUri(), job.getConfiguration())
to be more robust to other FileSystems (local, s3, hdfs, etc).If you agree that this is worthwhile, I'm happy to submit a PR with the change.