Open jar349 opened 7 years ago
I agree with you that the documentation is not clear on how to use the hadoop and hbase docker images together. Using the environment variables is interesting way that fits well with the Docker approach.
You should consider that you may lose data locality with this method. As far I know, Hadoop is not yet docker aware. So if the datanode and regionserver runs in separate containers, they will have different IP addresses and hbase will assume that the 2 services are not local on the same machine. Therefore, the data access may not be optimal.
However many people uses S3 in production, and Hadoop can't figure out data locality with S3 either.
Can you elaborate more on your use case?
Use case:
Building a library of compose files that I can, ahem... compose together, a la: https://docs.docker.com/compose/extends/#/multiple-compose-files
I've already got a zookeeper quorum, and I've got a distributed hadoop cluster (using your hadoop image to provide a name node, data note, and secondary name node.
Now I want a set of files that I can compose on top of zookeeper/hadoop: hbase, spark, kylin, etc.
So, this would be for local development and testing. But my goal is to try to mimick a realistic setup, meaning: more than one zk instance, hadoop secondary name node, more than one hbase region server, hbase actually using hadoop instead of local file system, etc.
I'd also appreciate this. This is the best hbase docker repo I can find (that works with Thrift), and having this described easily in the README would make this repository immensely powerful. Starting with no knowledge of HBase or HDFS, I'd be able to spin up a near-production-ready HDFS-backed HBase DB in 10 minutes. You have to admit, that's pretty cool.
Don't forget all the students out there coming out of school, getting their feet wet with big data tools, and floundering because of their complexity. This would go a good ways toward helping them.
Hi Dav and John Ruiz,
Sure! if you could be please submit a merge request, I'll have it approved and deployed.
-D
On Sat, Feb 15, 2020 at 5:34 PM dav-ell notifications@github.com wrote:
I'd also appreciate this. This is the best hbase docker repo I can find (that works with Thrift), and having this described easily in the README would make this repository immensely powerful. Starting with no knowledge of HBase or HDFS, I'd be able to spin up a near-production-ready HDFS-backed HBase DB in 10 minutes. You have to admit, that's pretty cool.
Don't forget all the students out there coming out of school, getting their feet wet with big data tools, and floundering because of their complexity. This would go a good ways toward helping them.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GELOG/docker-ubuntu-hbase/issues/7?email_source=notifications&email_token=AAACU3C3BXWAO4DKLKOPNZTRDBUXXA5CNFSM4CZQHYYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL3YQLA#issuecomment-586647596, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACU3FQPSH7P7QN73ZJFW3RDBUXXANCNFSM4CZQHYYA .
Thanks! I'll see what I can do.
Do you happen to know how to do it already? My progress on Hadoop in Docker has been slow. sequenceiq's is super old, big-data-europe's was giving me errors, and harisekhon's seems to work perfectly, so I was using that. However, trying to connect HBase to it hasn't been straightforward.
I had to change the configuration file (hdfs-site.xml
) from the default (which was writing to /tmp
) to:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data</value>
</property>
</configuration>
in order for it to write to a new directory (that's easier for me to mount). Then I run it using something like:
docker run -d --name hdfs -p 8042:8042 -p 8088:8088 -p 19888:19888 -p 50070:50070 -p 50075:50075 -v $HOME/hdfs-data:/data -v $HOME/hdfs-site.xml:/hadoop/etc/hadoop/hdfs-site.xml harisekhon/hadoop
After that, I feel pretty confident about HDFS being setup properly. However, to connect HBase to it, the best I've got so far is changing the hdfs url to:
hdfs://ip-of-docker-container:8020/
Does that look right?
Actually, that worked. Have any corrections before I add it to the readme?
Pull request #10 added.
I'm using your docker images to create a hadoop cluster (defined in a docker-compose file). Now, I would like to add your hbase image, but it is configured to use local storage.
I could create my own image based on yours with a custom configuration file, or I could mount the config volume and place my own config file there for hbase to read. However, I think there's a simpler path: taking
local
orhdfs
as an argument and doing the "right thing" on the user's behalf.I am imagining something like
command: hbase master local start
orcommand: hbase master hdfs start
where the values you'd need to configure site.xml to use hadoop would come from environment variables (-e HDFS_MASTER=<hostname>
).What do you think?