Closed safqwf closed 5 years ago
@roman-io Alluxio has a similar concept of using empty placeholder objects to represent directories. Instead of _$folder$
we use /
by default. The suffix is controlled by the alluxio.underfs.s3a.directory.suffix
property. Can you try setting the property to _$folder$
? Then Alluxio will understand that month=01_$folder$
is a folder, not a file.
To update the property, update alluxio-site.properties
on all servers
alluxio.underfs.s3a.directory.suffix=_$folder$
then restart the cluster
this feature request can be already achieved by existing alluxio properties. We will close this Issue in a few days if you don't have further request .
@roman-io Alluxio has a similar concept of using empty placeholder objects to represent directories. Instead of
_$folder$
we use/
by default. The suffix is controlled by thealluxio.underfs.s3a.directory.suffix
property. Can you try setting the property to_$folder$
? Then Alluxio will understand thatmonth=01_$folder$
is a folder, not a file.To update the property, update
alluxio-site.properties
on all servers
alluxio.underfs.s3a.directory.suffix=_$folder$
then restart the cluster
It worked. Thanks!
Is your feature request related to a problem? Please describe. Assume we have a directory with Parquet files in S3 called
mydir
. Alluxio setup in EMR cluster with this directory as the UFS with read-only permission. The data in this directory is generated by S3 Hadoop related components that create $folder$ files in the directory. These $folder$ files should not be deleted. Presto and Hive in the EMR cluster query a table withLOCATION 'alluxio://master_hostname:port/mydir'
When trying to query the data with Presto or Hive, I'm getting this error:Query 20190321_132537_00026_4enx4 failed: Error opening Hive split alluxio://master_hostname:port/year=2019/month=01_$folder$ (offset=0, length=0): alluxio://master_hostname:port/year=2019/month=01_$folder$ is not a valid Parquet File
Hive doesn't have an option to ignore specific files based on regex, neither Alluxio. These files shouldn't be deleted.
Describe the solution you'd like Add configuration to ignore files based on regex, or Add configuration to ignore _$folder files.
Describe alternatives you've considered Can't find any solution.