Closed szewi closed 4 years ago
Can you upload some sample script from cloud dataproc using sqoop
@Bidyadharbarik84 Please take a look on some example workflow. I assume you copy sqoop.sh
from PR to your GCS bucket:
test-sql
test-bigtable
gcloud dataproc clusters create sqoop-single --single-node \
--initialization-actions gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh,gs://dataproc-initialization-actions/hive-hcatalog/hive-hcatalog.sh,gs://dataproc-initialization-actions/bigtable/bigtable.sh,gs://<your-bucket-with-sqoop-init-action>/sqoop.sh \
--scopes cloud-platform --properties hive:hive.metastore.warehouse.dir=gs://<your-bucket-for-hive>/hive-warehouse \
--metadata "hive-metastore-instance=<you-project>:<you-region>:test-sql" \
--metadata "bigtable-project=<you-project>" --metadata "bigtable-instance=test-bigtable"
gcloud compute ssh sqoop-single-m
git clone https://github.com/datacharmer/test_db
and cd test_db
mysql -u root < employees.sql
sqoop import
:
/usr/lib/sqoop/bin/sqoop import --connect jdbc:mysql://localhost/employees --username root \
--table employees --columns "emp_no,first_name" --hbase-table employees \
--column-family my-column-family -hbase-row-key emp_no --m 1 --hbase-create-table
This command will import some columns from Cloud SQL employees db to BigTable instance using hbase binaries. You should see sth like this in the output:
19/02/19 16:01:50 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 20.5951 seconds (0 bytes/sec) 19/02/19 16:01:50 INFO mapreduce.ImportJobBase: Retrieved 300024 records.
Hope this helps.
This is a gcloud command which submits a Sqoop job to a dataproc cluster:
gcloud dataproc jobs submit hadoop \
--cluster=incremental-imports-martijn-sqoop \
--class=org.apache.sqoop.Sqoop \
--jars=gs://mybucket/jars/sqoop-1.4.7-hadoop260.jar,gs://mybucket/jars/avro-tools-1.8.2.jar,file:///usr/share/java/mysql-connector-java-5.1.42.jar \
-- import -Dmapreduce.job.user.classpath.first=true \
--connect=jdbc:mysql://mydbhost:3306/database \
--username=hadoop \
--password-file=gs:/mybucket/creds/passwordFile.txt \
--target-dir=gs://mybucket/dump/mysql_output \
--table=mytable \
--as-avrodatafile
Thanks for information , let me try .
Unable to import data from mysql to hive using sqoop through dataproc cluster
sqoop import --connect "jdbc:mysql://localhost:3306/test" \
--username dev2 --password dev2 --table test_09 \
--target-dir gs://testbuket --fields-terminated-by "," \
--hive-import --create-hive-table --hive-table default.test_09 -m 1
Error:
Warning: /usr/lib/sqoop/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/lib/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/25 10:29:08 INFO sqoop.Sqoop: Running Sqoop version: 1.5.0-SNAPSHOT
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/sqoop/build/ivy/lib/sqoop/test/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/sqoop/build/ivy/lib/sqoop/test/logback-classic-1.0.9.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/sqoop/build/ivy/lib/sqoop/test/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/25 10:29:08 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/03/25 10:29:08 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/03/25 10:29:08 INFO tool.CodeGenTool: Beginning code generation
19/03/25 10:29:08 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:874)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:762)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:785)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:288)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:259)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:245)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:333)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1879)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1672)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:515)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:633)
at org.apache.sqoop.Sqoop.run(Sqoop.java:146)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:182)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:233)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:242)
at org.apache.sqoop.Sqoop.main(Sqoop.java:251)
@Bidyadharbarik84 Thanks for submitting this. The root cause of the error is missing MySQL jar on the classpath. The easiest to fix it is to pass additionally at the very end of initialization-actions list gs://dataproc-initialization-actions/hive-hcatalog/hive-hcatalog.sh
when creating a cluster. Example below:
gcloud dataproc clusters create sqoop \
--initialization-actions gs://<sqoop-init-action>,gs://dataproc-initialization-actions/hive-hcatalog/hive-hcatalog.sh
You can also test symlinking mysql jar to sqoop lib dir:
ln -s /usr/share/java/mysql.jar /usr/lib/sqoop/lib/mysql.jar
Hello Szewi,
I have created symlink to mysql jar but getting below error:
main ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
Please help me. Thanks for advance .
HI , I am trying to import data from cloud mysql to hive warehouse, using data proc hadoop cluster . Could you please provide sample script for that. Thanks.
After ran sqoop import
command, getting below errors . Could you please help me with below errors?
19/04/02 07:05:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test_data` AS t LIMIT 1
19/04/02 07:05:06 INFO hive.HiveImport: Loading uploaded data into Hive
19/04/02 07:05:06 INFO conf.HiveConf: Found configuration file null
2019-04-02 07:05:07,623 main ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
You must be facing SQOOP-3274 issue.
You can workaround it by disabling log4j JMX using log4j2.disable.jmx
option:
sqoop import -D log4j2.disable.jmx=true --connect ...
Could you please add some CouldSQL sample script for using Sqoop and Hive?
19/04/02 07:05:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM test_data AS t LIMIT 1
19/04/02 07:05:06 INFO hive.HiveImport: Loading uploaded data into Hive
19/04/02 07:05:06 INFO conf.HiveConf: Found configuration file null
2019-04-02 07:05:07,623 main ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
How sqoop that reads from an Oracle Datasource.? Any Idea
@Bidyadharbarik84 See the sample below. CloudSQL is just a MySQL/PostgreSQL database, so it will accept the same JDBC connection string.
gcloud dataproc jobs submit hadoop \
--cluster=incremental-imports-martijn-sqoop \
--class=org.apache.sqoop.Sqoop \
--jars=gs://mybucket/jars/sqoop-1.4.7-hadoop260.jar,gs://mybucket/jars/avro-tools-1.8.2.jar,file:///usr/share/java/mysql-connector-java-5.1.42.jar \
-- import -Dmapreduce.job.user.classpath.first=true \
--connect=jdbc:mysql://mydbhost:3306/database \
--username=hadoop \
--password-file=gs:/mybucket/creds/passwordFile.txt \
--target-dir=gs://mybucket/dump/mysql_output \
--table=mytable \
--hive-database=my_hive_database \
--hive-import \
--hive-overwrite \
--hive-table=my_hive_table \
--hive-home=/usr/lib/hive/lib/
@pmiDataGen Just add a OracleDriver to the jars command (this Jar can live in a GCS bucket) and use a JDBC connection string with an Oracle scheme.
Also important: Make sure to an image v1.1 based DataProc cluster, since Sqoop 1.4.7 is incompatible with the newer Hadoop found in later Dataproc images.
I created cluster with:
gcloud dataproc clusters create rodsbigdata --zone us-central1-a --scopes default,sql-admin \
--image-version 1.1 \
--initialization-actions gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh \
--properties hive:hive.metastore.warehouse.dir=gs://rodbigdatabucket/hive-warehouse \
--metadata "enable-cloud-sql-hive-metastore=false" \
--metadata "additional-cloud-sql-instances=big-data-tools-salford:us-central1:rodmysqlgoogle=tcp:3307"
Using this command to import data from a Cloud SQL database:
gcloud dataproc jobs submit hadoop \
--bucket=gs://rodbigdatabucket \
--cluster=rodsbigdata \
--class=org.apache.sqoop.Sqoop \
--jars=gs://rodbigdatabucket/jars/sqoop-1.4.7-hadoop260.jar,gs://rodbigdatabucket/jars/avro-tools-1.8.2.jar,file:///usr/share/java/mysql-connector-java-5.1.42.jar \
-- \
import -Dmapreduce.job.user.classpath.first=true \
--connect=jdbc:mysql://localhost:3307/road_safety_data \
--username rodmartin --target-dir=gs://rodbigdatabucket/mysql_output \
--table road_safety_data \
--hive-import \
--hive-overwrite \
--target-dir gs://rodbigdatabucket/hivedata
Output:
Job [7afdd9ceeae549fbb5cfbf7c3706b3e6] submitted.
Waiting for job output...
19/04/20 18:35:40 WARN tool.SqoopTool: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
19/04/20 18:35:41 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/04/20 18:35:41 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/04/20 18:35:41 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/04/20 18:35:41 WARN sqoop.ConnFactory: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
Caused by: java.lang.NoClassDefFoundError: org/apache/avro/LogicalType
at org.apache.sqoop.manager.DefaultManagerFactory.accept(DefaultManagerFactory.java:67)
at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184)
at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:272)
at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:96)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:616)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
... 5 more
Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 16 more
ERROR: (gcloud.dataproc.jobs.submit.hadoop) Job [7afdd9ceeae549fbb5cfbf7c3706b3e6] failed with error:
Job failed with message [Exception in thread "main" java.lang.reflect.InvocationTargetException]. Additional details can be found in 'gs://dataproc-ac9deec7-51ac-448f-ad8a-61a3c899c059-us/google-cloud-dataproc-metainfo/5c0ac39d-4906-4336-a8fe-5d2a1a1013b1/jobs/7afdd9ceeae549fbb5cfbf7c3706b3e6/driveroutput'.
Any help appreciated, I have tried many variations of the command. I started with a standard Dataproc image version then tried version 1.1 based on a comment above
@rod-m You may want try to set mapreduce.job.classloader
property to true
:
gcloud dataproc jobs submit hadoop \
--bucket=gs://rodbigdatabucket \
--cluster=rodsbigdata \
--class=org.apache.sqoop.Sqoop \
--jars=gs://rodbigdatabucket/jars/sqoop-1.4.7-hadoop260.jar,gs://rodbigdatabucket/jars/avro-tools-1.8.2.jar,file:///usr/share/java/mysql-connector-java-5.1.42.jar \
--properties=mapreduce.job.user.classpath.first=true,mapreduce.job.classloader=true \
-- \
import \
--connect=jdbc:mysql://localhost:3307/road_safety_data \
--username rodmartin --target-dir=gs://rodbigdatabucket/mysql_output \
--table road_safety_data \
--hive-import \
--hive-overwrite \
--target-dir gs://rodbigdatabucket/hivedata
Thanks for the suggestion, it doesn't fix it for me though.
(I am running the command from the Jupyter notebook installed on the cluster)
I think the problem is with sqoop-1.4.7-hadoop260.jar
not compatible with Hadoop 2.7.4!
Where can I find sqoop-1.4.7-hadoop274.jar
?
gcloud dataproc jobs submit hadoop \
--cluster=rodsbigdata \
--class=org.apache.sqoop.Sqoop \
--jars=gs://rodbigdatabucket/jars/sqoop-1.4.7-hadoop260.jar,gs://rodbigdatabucket/jars/avro-tools-1.8.2.jar,file:///usr/share/java/mysql-connector-java-5.1.42.jar \
-- \
import -Dmapreduce.job.user.classpath.first=true \
--properties 'mapred:mapreduce.job.user.classpath.first=true','mapred:mapreduce.job.classloader=true' \
--connect=jdbc:mysql://cloudsql/big-data-tools-salford:us-central1:rodmysqlgoogle/road_safety_data \
--username rodmartin \
--target-dir=gs://rodbigdatabucket/mysql_output \
--table road_safety_data \
--hive-import
Now I just get loads of unrecognized commands:
19/04/21 15:59:40 WARN tool.SqoopTool: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
19/04/21 15:59:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --properties
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: mapred:mapreduce.job.user.classpath.first=true,mapred:mapreduce.job.classloader=true
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --connect=jdbc:mysql://cloudsql/big-data-tools-salford:us-central1:rodmysqlgoogle/road_safety_data
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --username
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: rodmartin
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --target-dir=gs://rodbigdatabucket/mysql_output
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --table
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: road_safety_data
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-import
Has anyone tried any of this in a recent Google Cloud Dataproc cluster?
You put --properties
parameter in the wrong place, this is gcloud
command parameter, not Sqoop parameter, that's why it should be before --
delimiter that separates gcloud
parameters from job's (in this case Sqoop) parameters.
Also, during job submission you don't need to specify mapreduce:
prefix for properties, it's used only in cluster creation commands.
May you execute this command without any modifications from Linux command line (not Jupyter notebook) and post result here:
gcloud dataproc jobs submit hadoop \
--cluster=rodsbigdata \
--class=org.apache.sqoop.Sqoop \
--jars=gs://rodbigdatabucket/jars/sqoop-1.4.7-hadoop260.jar,gs://rodbigdatabucket/jars/avro-tools-1.8.2.jar,file:///usr/share/java/mysql-connector-java-5.1.42.jar \
--properties mapreduce.job.user.classpath.first=true,mapreduce.job.classloader=true \
-- \
import \
--connect=jdbc:mysql://cloudsql/big-data-tools-salford:us-central1:rodmysqlgoogle/road_safety_data \
--username rodmartin \
--target-dir=gs://rodbigdatabucket/mysql_output \
--table road_safety_data \
--hive-import
Thanks for the suggestion, it doesn't fix it for me though. (I am running the command from the Jupyter notebook installed on the cluster) I think the problem is with sqoop-1.4.7-hadoop260.jar not compatible with hadoop 2.7.4! Where can I find sqoop-1.4.7-hadoop274.jar ?
! gcloud dataproc jobs submit hadoop \
--cluster=rodsbigdata \
--class=org.apache.sqoop.Sqoop \
--jars=gs://rodbigdatabucket/jars/sqoop-1.4.7-hadoop260.jar,gs://rodbigdatabucket/jars/avro-tools-1.8.2.jar,file:///usr/share/java/mysql-connector-java-5.1.42.jar \
-- import -Dmapreduce.job.user.classpath.first=true \
--properties 'mapred:mapreduce.job.user.classpath.first=true','mapred:mapreduce.job.classloader=true' \
--connect=jdbc:mysql://cloudsql/big-data-tools-salford:us-central1:rodmysqlgoogle/road_safety_data \
--username rodmartin \
--target-dir=gs://rodbigdatabucket/mysql_output \
--table road_safety_data \
--hive-import
Now I just get loads of Unrecognized commands:
19/04/21 15:59:40 WARN tool.SqoopTool: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
19/04/21 15:59:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --properties
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: mapred:mapreduce.job.user.classpath.first=true,mapred:mapreduce.job.classloader=true
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --connect=jdbc:mysql://cloudsql/big-data-tools-salford:us-central1:rodmysqlgoogle/road_safety_data
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --username
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: rodmartin
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --target-dir=gs://rodbigdatabucket/mysql_output
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --table
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: road_safety_data
19/04/21 15:59:40 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-import`
Has anyone tried any of this in a recent google clouad dataproc cluster?
From the google cloud console
gcloud dataproc jobs submit hadoop \ --cluster=rodsbigdata \ --class=org.apache.sqoop.Sqoop \ --jars=gs://rodbigdatabucket/jars/sqoop-1.4.7-hadoop260.jar,gs://rodbigdatabucket/jars/avro-tools-1.8.2.jar,file:///usr/share/java/mysql-connector-java-5.1.42.jar \ --properties mapreduce.job.user.classpath.first=true,mapreduce.job.classloader=true \ -- \ import \ --connect=jdbc:mysql://cloudsql/big-data-tools-salford:us-central1:rodmysqlgoogle/road_safety_data \ --username rodmartin \ --target-dir=gs://rodbigdatabucket/mysql_output \ --table road_safety_data \ --hive-import
Output:
Job [a3f6ff982f7f46bea2e1eca98e002e08] submitted.
Waiting for job output...
19/04/21 17:35:32 WARN tool.SqoopTool: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
19/04/21 17:35:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/04/21 17:35:32 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/04/21 17:35:32 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/04/21 17:35:32 WARN sqoop.ConnFactory: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
Caused by: java.lang.NoClassDefFoundError: org/apache/avro/LogicalType
at org.apache.sqoop.manager.DefaultManagerFactory.accept(DefaultManagerFactory.java:67)
at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184)
at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:272)
at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:96)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:616)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
... 5 more
Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 16 more
Class org.apache.avro.LogicalType
available only in Avro 1.8:
https://search.maven.org/search?q=fc:org.apache.avro.LogicalType%20AND%20a:avro
Try to use Sqoop 1.4.6 and Avro 1.7.7 (the same version is used by Hadoop) instead of Sqoop 1.4.7 that uses Avro 1.8.
HI Can you someone help me , how to load data from GCS bucket to cloud sql using dataproc with sqoop.
running following sqoop command in GCP hadoop cluster
/usr/lib/sqoop/bin/sqoop import \
--connect jdbc:mysql://34.67.251.162:3306/retail_db \
--username root \
--password root \
--table order_items \
--hive-import \
--hive-database datkranthi \
--create-hive-table \
--as-textfile \
--fields-terminated-by ';' \
-m 3
I am getting following error:
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
19/08/17 14:00:23 WARN conf.HiveConf: HiveConf of name hive.hwi.war.file does not exist
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
19/08/17 14:00:23 INFO SessionState:
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
19/08/17 14:00:23 INFO session.SessionState: Created HDFS directory: /tmp/hive/kranthikumar9625/b6260189-3711-4897-bc78-6b89064e40cc
19/08/17 14:00:24 INFO session.SessionState: Created local directory: /tmp/kranthikumar9625/b6260189-3711-4897-bc78-6b89064e40cc
19/08/17 14:00:24 INFO session.SessionState: Created HDFS directory: /tmp/hive/kranthikumar9625/b6260189-3711-4897-bc78-6b89064e40cc/_tmp_space.db
19/08/17 14:00:24 ERROR tool.ImportTool: Import failed: java.io.IOException: Exception thrown in Hive
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:358)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331)
... 9 more
Caused by: java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:614)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:549)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:750)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
... 14 more
Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.SessionNotRunning
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 18 more
Please answer this
Sqoop init action was added in #465
We should add sqoop to dataproc.
Sqoop is tool that allows to import/export data between different types of databases, it also allows to import data to hadoop to perform MR jobs. There is
cloud-sql-proxy
init action working, so we should configure sqoop to allow to pull the data from cloud SQL, we should also enable Sqoop jobs for BigTable.