Closed milvain closed 6 years ago
It should be possible for webSpoon to work with Drill and S3.
Please refer to here for Drill and there for S3.
Basically, you have to install the JDBC driver for Drill at $CATALINA_HOME/lib
and the credential for S3 at /root/.aws
of the container.
There are many ways to secure webSpoon by user authentication. One easy way is described here.
In either case (JDBC, S3, and user-auth), you have to make changes to the deployed container. It is fine during a development phase, but I'd suggest you to create your own custom container image yourself then deploy it to the cloud. To create a custom image, you have to create your own Dockerfile. Your Dockerfile would look like this:
FROM hiromuhota/webspoon:latest-full
COPY drill_jdbc.jar $CATALINA_HOME/lib
COPY .aws /root/.aws
some codes for user-auth
Hope this helps.
Hi Hiromu,
Many thanks for your answer. I think it's too complicated for me to create my own Dockerfile. I would need a more detailed tutorial to successfully build my image... For authentication on the webspoon application I can put only one credentials that just 3 people will use, and do not deploy an LDAP or other that is too complicated for me.
Concretely, I need to create a Dockerfile into a custom image on DockerHub and after modify your Dockerrun.aws.json with my image name inside, right ?
Could you please help me for creating the Dockerfile integrating webSpoon with JDBC driver for Drill and running Drill on the Docker? I have a .tbl file input from AWS S3 to Spoon
# Copy webSpoon from Hiromuhota repository
FROM hiromuhota/webspoon:latest-full
# Get drill
RUN wget http://apache.osuosl.org/drill/drill-1.14.0/apache-drill-1.14.0.tar.gz
# Create Drill folder
RUN mkdir -p $HOME/drill
# Extract Drill
RUN tar -xvzf apache-drill-1.14.0.tar.gz -C $HOME/drill
# Install the JDBC driver for Drill
COPY $HOME/drill/jars/jdbc-driver/drill-jdbc-all-1.14.0.jar $CATALINA_HOME/lib
# Install the credential for AWS S3
# COPY .aws /root/.aws
COPY failed: stat /var/lib/docker/tmp/docker-builder257350670/drill/jars/jdbc-driver/drill-jdbc-all-1.14.0.jar: no such file or directory
Many thanks.
I can help you creating your Dockerfile, but you have to understand how it works.
Concretely, I need to create a Dockerfile into a custom image on DockerHub and after modify your Dockerrun.aws.json with my image name inside, right ?
Basically yes, but not exactly correct. What you need to do:
COPY failed: stat /var/lib/docker/tmp/docker-builder257350670/drill/jars/jdbc-driver/drill-jdbc-all-1.14.0.jar: no such file or directory
This should fail, because COPY
copies a file/directory from the Docker host to the Docker container (image), but the host does not have that file (remember that RUN
executes things locally in the container(image)).
Instead, you should use RUN cp $HOME/drill/jars/jdbc-driver/drill-jdbc-all-1.14.0.jar $CATALINA_HOME/lib
For your webSpoon user-auth, download web.xml and security.xml to your host's current directory where Dockerfile resides. Edit them according to here, and finally add these lines to your Dockerfile
COPY web.xml $CATALINA_HOME/webapps/spoon/WEB-INF/web.xml
COPY security.xml $CATALINA_HOME/webapps/spoon/WEB-INF/spring/security.xml
More specifically, edit security.xml
at the following line to change the user/password.
You can add multiple lines for different users if you want.
<user name="user" password="password" authorities="ROLE_USER" />
Thanks, my image works with user-auth!! I don't understand 2 points:
Many thanks
You can 1. install Drill in the same container as webSpoon, or 2. deploy a Drill instance separately and let webSpoon connect to this Drill instance. I'd recommend the latter option as a rule of thumb.
The .tbl origin file is on the s3 bucket, and how I can build a new .parquet and .csv file with the transformation directly on S3? Is it possible?
It sounds like a question for PDI/Spoon not for webSpoon. Does your current ktr (S3, Drill, etc.) work on Spoon? If yes, it should work on webSpoon too.
Hi and many thanks for yoy work!! I would know if is it possible to install webspoon on aws interaction with Drill database and S3 ? I already install webspoon with Beanstalk and it's works but I don't know how I can connect Drill and S3 with this container. Any idea? My second question is: how I can secure the webspoon website access with credentials? Many thanks