achintya-kumar / BD2017

Otto-von-Guericke Universität Magdeburg - Big Data SoSe 2017
2 stars 0 forks source link

Installation #2

Open achintya-kumar opened 7 years ago

achintya-kumar commented 7 years ago

Hi @HorizonNet and @tantalus1984 ! I have two doubts from the first task. Kindly help me understand the problem better.

  1. It is a regular practice to disable SELinux. It is not in the list of tasks. Should not this be one of the tasks?
  2. Linux OS reserves 5% of the memory for root users, which implies the remaining can be used by non-root user. Is that what you mean by reserve space for non root volumes?

Thank you in advance.

HorizonNet commented 7 years ago

@achintya-kumar Good questions.

  1. Best answer: it depends. Some components of Hadoop are not working very well with SELinux. You can create policies, but they are hard to implement. Where does the depends comes into play? It depends on whether your cluster is in the public net or not. It also depends on your security policies. For a POC cluster, which is what you're going to do, it is in my opinion totally fine to have SELinux disabled. It is not in the list of System Configuration Checks because this list is not complete from an installation point of view (the list is not intended to be a step-by-step guide). It only helps us (and you) to see if there are possible problems you can run into when the cluster is running for a while.
  2. Not exactly. We want to see that you made sure to have enough disk space before starting the installation.

Hope this answers your questions.

achintya-kumar commented 7 years ago

Thank you so much for your reply. It is crystal clear now. I have one additional question.

I am using Azure Free Tier for this assignment. I am allowed to use 4 cores per region. I have currently 2 machines, with 2 cores/16GB memory each. My initial plan was to go with the recommended size of the cluster, ie 5. However, because of the regional limit, I am limited to 2 machines as of now.

I am allowed to create new VM instances in other regions. My question is, is it possible by any way to have nodes in different regions and yet build a cluster out of them without increased complexity?

HorizonNet commented 7 years ago

This should be possible, but a problem you can run into is the traffic between the regions. Normally traffic inside a datacenter is free, but in- and outbound traffic is not. Not sure if this can be a problem in the free tier. Do you can, by chance, increase the core restriction via a service request to the Azure team? I did this to increase the core limit on my MSDN subscriptions, but I'm not sure if that is possible in the free tier.

achintya-kumar commented 7 years ago

Thank you!

Upon requesting increment, they ask me to upgrade to 'Pay As You Go' tier. I suppose I should do it once I've everything working for my free-tier nodes.

HorizonNet commented 7 years ago

You should stay in the free tier. Try to use different regions, but at first you should review in- and outbound traffic limitations. It could be that this isn't a problem at all.

achintya-kumar commented 7 years ago

Hi! Here is a report of two things I have learnt so far.

  1. CM demands that we disable SELinux at the time of installation.
  2. I created some solid hosts(4 cores, 28GB, 200GB SSD) but in different regions(West Europe, North Europe and West US). This led to having to route the data through the internet for connectivity. While it works, I believe an intra-datacentre cluster will outperform the current setup by several folds.
HorizonNet commented 7 years ago

Below is a short review.

Tasks

Open points:

General feedback:

You're definitely on the right track, but details are important. Have a look at the open points above. The amount of documentation you wrote so far is pretty good and helps to understand where you went into the wrong direction.

achintya-kumar commented 7 years ago

Thank you for taking your time and giving me this detailed feedback. I shall rectify what's wrong here and get back to you.

Best Regards

achintya-kumar commented 7 years ago

screenshot from 2017-06-07 23-29-30

Hi! I did this installation yesterday. Upon reaching CDH installation phase, despite my CM being of version 5.8.3, it doesn't let me have the same version of CDH with the note shown in the image above. This is the reason why I have CM 5.8.3 and CDH 5.8.4 in my installation as it doesn't let one choose.

Thanks! :)

HorizonNet commented 7 years ago

That shouldn't be possible. You cannot manage a CDH version newer than your CM version. Don't know if that's also true for patch versions. Definitively for minor and major versions.

Did you install CM via an installer or YUM?

achintya-kumar commented 7 years ago

This was done using YUM.

HorizonNet commented 7 years ago

I have gone through the documentation. The minor version of CM must be equal of the minor version of CDH. Nevertheless, you're working with the default parcel version of CM 5.8.3 which is the latest CDH 5.8.x version. Changing the default let's you install CDH 5.8.3. This link may be helpful.