dvoros / docker-sqoop

Apache Sqoop docker image
19 stars 7 forks source link

Trying to sqoop to S3 does not work #3

Open hsahay1970 opened 3 years ago

hsahay1970 commented 3 years ago

Hi I am trying to sqoop to S3. I faced many issues but i solved them one by one. I am now stuck at the final stage and it's still not working. I don't know why. Please read -

For this to work i had to do the following -

  1. Create a file called .hadooprc and copy it to the same folder as your dockerfile The content of .hadooprc is simply - hadoop_add_classpath "${HADOOP_HOME}/share/hadoop/tools/lib/*" I had to do this because hadoop-aws-3.1.0.jar which is in the above folder is not by default a part of the classpath as obtained by running the command "hadoop classpath" This jar is needed because this is where all the S3, S3A, S3N classes are implemented.

  2. Then i edited your dockerfile to include the following command COPY .hadooprc /root/.hadooprc

  3. Then i did a build from this dockerfile and ran it using the commands below -

docker build -t mysqoop:latest . docker run -it --mount type=bind,source=C:\temp\myfiles,target=/jdbc mysqoop:latest

Inside the running container, i edited the core-site.xml to add the following properties needed by the S3A filesystem - (when i have it working i will copy core-site.xml with these values the same way i copied .hadooprc earlier in the dockerfile)

fs.s3a.access.key ************ fs.s3a.secret.key *************** fs.s3a.server-side-encryption-algorithm SSE-KMS **###############//official documentation is wrong. It says .key instead of -key. But i tried both** fs.s3a.server-side-encryption-key *************************** fs.defaultFS hdfs://docker-desktop:9000
  1. Now when i run the sqoop command, it works if I am writing the file to the hdfs file system. But if i try to write to S3 as in the command below, it simply says import failed after running for about 44 seconds-

sqoop import --enclosed-by '"' --escaped-by \ --table WrkID_ISRC --target-dir s3a://my_bucket/wrkid_isrc --connect 'jdbc:sqlserver://myserver:myport;database=mydatabase' --username myusername --password mypassword --num-mappers 1 --compress --compression-codec org.apache.hadoop.io.compress.GzipCodec

Here is the error message i am getting - I don't know how to find more about the error because yarn logs command with the application id etc does not work. Neither do any of the URLS shown in the error message below -

2021-04-30 02:02:05,670 INFO mapreduce.Job: Job job_1619742965196_0015 running in uber mode : false 2021-04-30 02:02:05,671 INFO mapreduce.Job: map 0% reduce 0% 2021-04-30 02:02:05,682 INFO mapreduce.Job: Job job_1619742965196_0015 failed with state FAILED due to: Application application_1619742965196_0015 failed 2 times due to AM Container for appattempt_1619742965196_0015_000002 exited with exitCode: 1 Failing this attempt.Diagnostics: [2021-04-30 02:02:02.365]Exception from container-launch. Container id: container_1619742965196_0015_02_000001 Exit code: 1

[2021-04-30 02:02:02.367]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

[2021-04-30 02:02:02.368]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

luissantanacontato commented 3 years ago

Did you fix the problem? Because I am trying the same thing. Thank you.