Closed amithkanand closed 9 years ago
We may want to keep the scratch directory to be user specific so that there is no data breach since data will be stored during query run which will be viewable to other users if the directory is not user specific. Also there is a possibility that the query data can stay in scratch if the client fails abnormally. We are creating a scratch directory drwxrwxrwt - hive supergroup 0 2014-11-13 10:40
in bcpc-hadoop::hive_metastore
recipe. Can we set the scratch directory to /tmp/scratch/hive-${user.name}
. It would be good to make /tmp/scratch
an attribute since it will be used in two dependent places.
Since scratch directory is created automatically when user executes a query, we do not need bcpc-hadoop::hive_metastore
to create that. Also using /tmp/scratch
as base folder can cause issues as it will be owned by the user running the very first hive query. The correct solutions seems to be define scratch dir as /tmp/hive-${user.name}
as /tmp is writeable by each user and user's scratch directory will be owned by him/her preventing accidental access to other's temp output. Also, by design all the temp output written to scratch directory is removed after query execution is complete.
Fixed in PR #36
In current implementation
hive-site.xml
is missinghive.exec.scratchdir
parameter. When a query is executed, using ODBC/JDBC that connects to ahiveserver2
process, temporary results are stored in directory specified byhive.exec.scratchdir
parameter. Not having this parameter in configuration file causes all the temporary output to be redirected to/tmp/hive-{process owner}
on hdfs and user executing the query gets apermission denied
error. Specifyinghive.exec.scratchdir
and pointing it to "/tmp" on hdfs will fix the issue.