Kitware / gobig

Provisioning big data applications with Resonant
Apache License 2.0
5 stars 3 forks source link

Add AWS creds if env vars are set #30

Closed kotfic closed 8 years ago

kotfic commented 8 years ago

Add AWS creds to the hadoop hdfs core-site.xml file if they are defined in the environment that runs ansible.

This is necessary for accessing objects stored on s3 with bin/hadoop

kotfic commented 8 years ago

To test copy the following into dev/vagrant.local.yml

domain: "cluster.dev"
ansible:
  verbose: "v"
  plays:
    - playbook: "playbooks/hadoop-hdfs/site.yml"

nodes:
  head:
    memory: 8192
    cpus: 2
    roles:
      - namenodes
      - datanodes

  data-01:
    memory: 8192
    cpus: 2
    roles:
      - datanodes

Adjust memory and cpus as needed, then test with exported values for aws keys, eg:

export AWS_ACCESS_KEY_ID=FOO
export AWS_SECRET_ACCESS_KEY=BAR

/opt/hadoop/2.7.1/etc/hadoop/core-site.xml should contain properties that defined access key and secrete key.

opadron commented 8 years ago

+1 I like this idea!

Just fyi, you can access s3 using s3n://KEY_ID:SECRET_KEY@BUCKET_PATH. But I agree that configuring it would be nice.

On the other hand, I never liked the idea of storing your AWS credentials in plain text on your own system, much less on an ec2 instance. Once there, you've given up on the fight for physical security of your credentials. I don't think this behavior should be the default.

Might I suggest that we hold off on this idea until we can get an aws-credentials role put together that can actually parse the config file (#24)? Then, the hdfs roles can pull that in as a dependency and implement the change you propose, here. And for security, I'd recommend defaulting to a no op if a user of the hdfs roles provides no aws profile.

kotfic commented 8 years ago

This sounds reasonable to me, in the mean time i will use the s3n://KEY_ID:SECRET_KEY@BUCKET_PATH format.