Open dgghosalaws opened 3 years ago
+1
+1
i get another error: Unable to instantiate a metastore client factory com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory due to: java.lang.ClassNotFoundException: Class com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory not found)
@Oleks777
I had the same issue on emr-5.36.0
(did not test other version) when trying to use pig
with HCatalog, so that I can load tables from Glue
to Pig
:
pig -useHCatalog
In my case the solution was to manually specify the missing jar:
pig -useHCatalog -Dpig.additional.jars=/usr/share/aws/hmclient/lib/aws-glue-datacatalog-hive2-client-1.18.0.jar
On other emr version, aws-glue-datacatalog-hive2-client-1.18.0.jar
may have different number. So go to /usr/share/aws/hmclient/lib/
and check.
Then to load data from glue table:
data = LOAD 'somedatabase.sometablename' USING org.apache.hive.hcatalog.pig.HCatLoader();
then check:
describe data;
thanks @moneroexamples ! yes, it did the trick. Instead of adding the jar like you describe, you can also use REGISTER command in the script.
Looks like this solution works only for 5x EMR releases (hive2), it doesn't work for 6x. Does anyone have any advice?
@Oleks777 I just checked on emr-6.6
and the following works:
pig -useHCatalog -Dpig.additional.jars=/usr/share/aws/hmclient/lib/aws-glue-datacatalog-hive3-client-3.5.0.jar
As a side note. On EMR 6.6, hcat
also does not work in itself with glue:
hcat -e "show databases;"
giving error:
Caused by: MetaException(message:Unable to instantiate a metastore client factory com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory due to: java.lang.ClassNotFoundException: Class com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory not found)
you can solve this by by setting up HIVE_AUX_JARS_PATH
before you call hcat
:
export HIVE_AUX_JARS_PATH=/usr/share/aws/hmclient/lib/aws-glue-datacatalog-hive3-client-3.5.0.jar
hcat -e "show databases;"
@moneroexamples many thanks! i spent a lot of time to compile the client for hive2 and it is good to know there is a compiled version available from AWS. Is this path: /usr/share/aws/hmclient/lib/aws-glue-datacatalog-hive3-client-3.5.0.jar available on the datanodes by default or emr needs to be configured somehow in the bootstrap step?
I request all to either support the premsie of the issue title or confirm if HCatStorer for partition write works with Glue data catalog as hive metastore. I completely get the iterations done above to make basic commands work with Pig on EMR. Thanks
@Oleks777 Sadly I don't know how to configure EMR so that the extra paths/jars are loaded for Pig
and hcat
at bootstrap step.
Any update on this issue? We also encountered the same getTokenStrForm is not supported
error when using HCatStorer(...)
in EMR.
I'm getting the same error when storing data to ORC or Parquet tables with latest version of EMR 6.12.0. It seems support to write to Glue tables is broken.
After a little bit of digging we can see the problem originates here:
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.cancelDelegationTokens(FileOutputCommitterContainer.java:1012) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:274)
If we look at the file:
We can see that cancellingDelegationTokens is the last thing that happens. We can also see how it's used:
All we really need to do is to return a null instead of throwing operation not supported and then delegation cancel method should work fine.
Use case Running the example here - > https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hcatalog-pig.html Outcome: Pig script Fails when Glue is the hive metastore.Script reports fail status. The files are written in S3 though Error logs