everpeace / vagrant-mesos

Spin up your Mesos Cluster with Vagrant! (VirtualBox and AWS)
https://github.com/everpeace/vagrant-mesos
MIT License
432 stars 138 forks source link

HDFS support #7

Open everpeace opened 10 years ago

everpeace commented 10 years ago
24601 commented 10 years ago

The way I've added HDFS support into this:

1) Used my fork of hadoop_cookbook (https://github.com/24601/hadoop_cookbook - have a pull request into the original to merge in the very small change I made for this cookbook to support Ubuntu 13.04)

2) Cluster configured for 2 masters + 3 slaves as follows:

-Masters are NameNode in HA config with auto-failover (might as well while we're at it, I figure...) -JournalNodes & DataNodes on Masters + Slaves (maybe no need for DataNode on masters, and probably no need for JournalNodes on master, either? Figured couldn't hurt for now and easy to not do it later on) -HDFS uses ZK quorum established by vagrant-mesos

3) HA uses sshfence, using existing key management provided in vagrant-mesos

Once I clean up things (like remove my AWS creds from cluster.yaml) I can fork and create PR if you want...

To do this required some appreciable changes to the Vagrantfile for multinodes to ensure the HDFS configuration was inserted into the chef.json object (I am NOT a ruby programmer, probably a better/more robust way to do it than I did, but my way works...) and just adding the hadoop cookbook into the Berkshelf file.

everpeace commented 10 years ago

Thank you @24601 !!

1)

Has your PR already been merged??

2)

Yes, I agree with you. I think it would be good that JournalNode and DataNode are only on slaves.

I'm really appreciated to your contributions and I'm happy to review your changes on chef.json. I can't wait for your PR!!

24601 commented 10 years ago

@everpeace, thanks for the quick reply! Happy to help and hope my contribution is helpful, I'll be cleaning up the code and will submit a PR soon. Here are a few answers before that:

1) Yes, it looks like https://github.com/continuuity/hadoop_cookbook has pulled in my changes (along with some of their own enhancements) to support ubuntu 13.04 and even 14.04, but I think (in my testing) Mesos doesn't run so well on 14.04 yet (things broke, not sure if it was easy stuff to fix, but didn't even bother as I found no need to move to 14.04 yet*).

2) I'll modify so JN and DN run on slaves only.

Still working on original project that this stuff was done for, will clean up and submit PR once that's done!

*Uh, I just kinda take that back, 13.04 is already EOL'ed, could just take a step back to 12.04 LTS which has good support, but I'd rather figure out the leap forward to 14.04 while I'm at it...this is a bit of a separate issue, but will likely work on it and might just throw all my changes into one PR, I know the hadoop cookbook works with 14.04 well, al beit officially unsupported.

24601 commented 10 years ago

@everpeace , making the changes/doing the clean up as discussed above to include HDFS support and move things to Ubuntu 14.04 LTS, not ready for a PR yet, but if you want, changes being made and occasionally synced with my fork here:

https://github.com/24601/vagrant-mesos

Feel free to make suggestions/comments, like I said, I'm not a ruby programmer or even too proficient with vagrant, but know enough to bumble-F my way through this to get it working as part of a larger project and am happy to share my work, even if it's not the greatest.

everpeace commented 10 years ago

I'm greatly appreciated your contribution @24601 again! I'm not so proficient in HDFS actually. So I'm really happy that you help!

I've watched your Vagrantfile and several comments. After your clean up, I expect that

About Ubuntu, I'm not so heavy user of it. I think vagrant-mesos should support mesos-docker executor. And, I understand we have to kernel upgrade if we used 12.04 right?? So, If mesos works properly in 14.04, I'm fine that we move to 14.04, I think.

theclaymethod commented 10 years ago

One of the problems I've had with HDFS is that it requires absolute IP addresses. For some reason, a lot of the Hadoop ecosystem doesn't play well with relative IPs. I'm not sure if this has been fixed.

But HDFS would be very helpful. I'm not a Chef expert so I've installed HDFS manually, along with Spark.