datacratic / StarCluster

StarCluster is a utility for creating and managing computing clusters hosted on Amazon's Elastic Compute Cloud (EC2).
http://star.mit.edu/cluster
GNU Lesser General Public License v3.0
37 stars 13 forks source link

plugin 'boto' causes later plugins to run as the wrong user #53

Open cariaso opened 8 years ago

cariaso commented 8 years ago

PLUGINS = whoami,boto,whoami,efs,whoami

51 and #52 relate to EFS, but I was having problems with it, which I believe I've isolated as being caused by the boto plugin.

This set of log messages seems to confirm it for me.

[ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:42,929 >>> Configuring passwordless ssh for sgeadmin [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:43,334 >>> Running plugin whoami [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:43,335 >>> Running whoami plugin [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:43,345 >>> whoami?: root [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:43,425 >>> Running plugin boto [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:43,649 >>> Installing AWS credentials for user: sgeadmin [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:43,650 >>> Installing current credentials to: /home/sgeadmin/.boto [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:43,728 >>> Running plugin whoami [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:43,728 >>> Running whoami plugin [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:43,739 >>> whoami?: sgeadmin [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:43,813 >>> Running plugin efs [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:43,904 >>> Configuring EFS for sg-f00b038b [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:44,063 >>> Authorizing EFS security group [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:44,483 >>> Authorizing EFS security group [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:44,832 >>> Mounting efs on all nodes [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:44,833 >>> Mounting efs on <Node: master (i-0c26c566b57535ead)> [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:45,018 !!! ERROR - Error occured while running plugin 'efs': [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:45,018 !!! ERROR - remote command 'source /etc/profile && mount -t nfs4 [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:45,018 !!! ERROR - -ominorversion=1 [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:45,018 !!! ERROR - us-east-1a.fs-1ca36455.efs.us-east-1.amazonaws.com:/ [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:45,018 !!! ERROR - /tmp/efs' failed with status 1: [ec2-54-198-147-85.compute-1.amazonaws.com] out: 2016-07-23 15:39:45,018 !!! ERROR - mount: only root can do that

Perhaps someone else can replicate and confirm?

I've looked at the source of starcluster/plugins/boto.py and it's not obvious to me what's causing the trouble.

FinchPowers commented 8 years ago

If I have to guess I would say that this is where it happens. https://github.com/datacratic/StarCluster/blob/000c041a9f71ed8099f461e6af9b145f1f654310/starcluster/plugins/boto.py#L43

cariaso commented 8 years ago

not sure how I missed that, but yes it seems more than likely.

2 possible solutions come to mind.

  1. switch_user() back to root before exiting the function
  2. never call switch_user() and instead chmod/chown suitable permissions

do you have a preference for either?

FinchPowers commented 8 years ago

I would encapsulate the first mssh.switch_user in a with block that calls back mssh.switch_user with root whenever it leaves the scope. That way it would go back to root whether exceptions are encountered or not.

vasisht commented 8 years ago

@cariaso Do you run into this problem if you switch the order of plugins? i.e. run efs before configuring boto for sgeadmin.

cariaso commented 8 years ago

No. That is the basis of my current work around.

On Jul 30, 2016 8:15 AM, "vasisht" notifications@github.com wrote:

@cariaso https://github.com/cariaso Do you run into this problem if you switch the order of plugins? i.e. run efs before configuring boto for sgeadmin.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/datacratic/StarCluster/issues/53#issuecomment-236316531, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHpksAR0wEpVm1LH6JV1DwoLGk_tm7Oks5qaomTgaJpZM4JTYey .