frazer-lab / cluster

Repo for cluster issues.
1 stars 0 forks source link

Migrate nodes from the old cluster to the new cluster #33

Closed nariai closed 8 years ago

nariai commented 8 years ago

Paul,

As you can see, our current bottleneck is the number of nodes in the new cluster system, and hence we want to move a part of nodes in the old cluster to the new cluster as soon as possible (maybe eight nodes out of 16 nodes, as the first step).

Can you tell us what need to be done before migrating nodes?

Naoki

tatarsky commented 8 years ago

Yep. That still OK?

hirokomatsui commented 8 years ago

Sure

tatarsky commented 8 years ago

You can change the BIOS to also not boot PXE.

tatarsky commented 8 years ago

OK. I show a 6.6 system!

tatarsky commented 8 years ago

I think I should have the "/" filesystem a little bigger.

tatarsky commented 8 years ago

As it includes /var. I propose 200GB. Its currently this:

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg0-lv_root
                       50G  793M   46G   2% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             477M   33M  419M   8% /boot
/dev/mapper/vg0-lv_scratch
                      837G   73M  794G   1% /scratch
hirokomatsui commented 8 years ago

That's right. Some software need temporary files on there.

hirokomatsui commented 8 years ago

10GB for / ?

tatarsky commented 8 years ago

Yeah. We don't have a ton of room, but lets take it up a bit.

tatarsky commented 8 years ago

OK. Made that change.

tatarsky commented 8 years ago

all new builds will get 200GB "/"

tatarsky commented 8 years ago

I would say start this one again and/or as many as you want. I feel it looks good.

hirokomatsui commented 8 years ago

OK, re-starting on cn2, cn1 then the others.

tatarsky commented 8 years ago

Nice!

hirokomatsui commented 8 years ago

Will I see the console?

tatarsky commented 8 years ago

The menu will appear, but the rest of the install goes to VNC. I can disable that if desired after cn1/cn2 to make sure they come up.

hirokomatsui commented 8 years ago

OK, doesn't have to be disabled.

tatarsky commented 8 years ago

Up to you....once we prove they rebuild clean I don't need to see the output. I also get a summary of where it is in the syslog.

tatarsky commented 8 years ago

I use the VNC when I am debugging a new node platform. As the install will stop and ask questions if I've not provided the proper KS values.

tatarsky commented 8 years ago

Formating partitions

hirokomatsui commented 8 years ago

I see, am starting cn1 too.

tatarsky commented 8 years ago

Cool!

tatarsky commented 8 years ago

In theory I suspect you can do about 5-10 at a time. Probably more.

tatarsky commented 8 years ago

Then later tonight I will do the Lustre, SGE and puppet.

I have one errand at 4:00PM my time (2:00PM your time)

hirokomatsui commented 8 years ago

OK, sounds great!

tatarsky commented 8 years ago

Hopefully this is a little easier ;)

hirokomatsui commented 8 years ago

Much easier for me.

tatarsky commented 8 years ago

And I will detail the config of the Kickstart items for folks once we get this done. It comes in very handy.

tatarsky commented 8 years ago

cn2 is loading software.

tatarsky commented 8 years ago

cn1 still doing disk partition formats but almost done.

tatarsky commented 8 years ago

cn2 rebooting.

hirokomatsui commented 8 years ago

working on cn3

tatarsky commented 8 years ago

yep.

tatarsky commented 8 years ago

I am switching the PXE default to the local disk so you don't have to play "find the console" as quickly ;)

tatarsky commented 8 years ago

New partitions give us a bit more breathing room.

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg0-lv_root
                      197G  801M  186G   1% /
tmpfs                  32G     0   32G   0% /dev/shm
/dev/sda1             477M   33M  419M   8% /boot
/dev/mapper/vg0-lv_scratch
                      689G   69M  654G   1% /scratch
tatarsky commented 8 years ago

I would say "lets proceed"

tatarsky commented 8 years ago

cn3 is formatting filesystems cn1 is rebooting

tatarsky commented 8 years ago

I am actually going to do my errand but I will have my laptop as it involves another project. I show cn1 looking good as well. cn3 loading software

tatarsky commented 8 years ago

I have all the other nodes configured to accept kickstart. I will attempt to keep up with the menu change but you may find some of them waiting if you don't flip them back to BIOS boot off the local drive.

hirokomatsui commented 8 years ago

OK. cn5's on VNC. cn4 might have HDD issue, am checking.

hirokomatsui commented 8 years ago

cn4's on VNC

hirokomatsui commented 8 years ago

cn6's on VNC

hirokomatsui commented 8 years ago

cn8's on VNC. cn7 cannot boot up. We will have to discard it by hardware issue. I'm leaving for a lunch break, and will do the rest after coming back.

tatarsky commented 8 years ago

Sound good. Removed boot triggers for all but cn7

hirokomatsui commented 8 years ago

cn9's on VNC

hirokomatsui commented 8 years ago

cn10's on VNC

hirokomatsui commented 8 years ago

cn11's on VNC

hirokomatsui commented 8 years ago

cn12's on VNC

tatarsky commented 8 years ago

Noted!

tatarsky commented 8 years ago

Moved menu item aside for less than cn12