DataONEorg / dataone

DataONE information and general-purpose issue tracking
Apache License 2.0
2 stars 0 forks source link

migrate VMs from ORC #13

Open mbjones opened 2 years ago

mbjones commented 2 years ago

VMs that need to be migrated from ORC to UCSB:

Good to migrate if possible, but not as critical:

@nickatnceas let's discuss placement of these. If possible, I'd like to move some of these to Anacapa, and others to NHDC (as indicated above). @taojing2002 and @datadavev can coordinate the moves on the DataONE side.

nickatnceas commented 2 years ago

@mbjones as of right now we don't have any VM hosts at Anacapa with the required specs, but we can move an older host or two from the NHDC.

The two VM hosts we are running at Anacapa (Pluto and Io) are 4 core R330/R340 1U servers with limited upgrade potential.

One note I have is that since Anacapa will be routed through campus, most of the planned campus network outages (like the few we had recently related to campus WiFi) will take down both the NHDC and Anacapa at the same time.

taojing2002 commented 2 years ago

We have some configurations (particularly on CNs) using the domain names. It will be great we can keep those domain names.

mbjones commented 2 years ago

@taojing2002 Yes, the domain names can stay the same.

We plan to migrate the 3 production VMs to Poseidon at NHDC, and then move that host to Anacapa once the transfer is complete. @nickatnceas will coordinate with @taojing2002 and @datadavev on shutting down services at needed times.

The other two non-production VMs will go to NHDC.

nickatnceas commented 2 years ago

I will need to reconfigure the hardware raid config on Poseidon, which will involve reinstalling the OS. This might take a day or two before we can start transferring data.

nickatnceas commented 2 years ago

I have server progress tracking at https://github.nceas.ucsb.edu/NCEAS/Computing/issues/106

I should be able to start initial (online) rsyncs tomorrow, and then offline migrations Thursday or Friday.

nickatnceas commented 2 years ago

Initial rsyncs are running for cn-orc-1 and mn-orc-1. They're running pretty slow, at 20-30 MB/sec (each), but should finish tomorrow before noon PT if they keep that speed.

nickatnceas commented 2 years ago

cn-orc-1 and search-orc-1 are ready for final migrations. mn-orc-1 is still running its initial rsync.

nickatnceas commented 2 years ago

@taojing2002 @datadavev the initial rsyncs are done for cn-orc-1, mn-orc-1, and search-orc-1. Do you want to plan on doing the final migration tomorrow (Friday 10/15)?

I need to do the migrations one VM at a time. It will look something like:

  1. Jing/Dave: Stop as many services as possible on the ORC VM (ie PG, Apache, Tomcat, etc)
  2. Nick: Run a final rsync to UCSB
  3. Nick: Update networking/grub/fstab/etc on the UCSB VM
  4. Nick: Boot the UCSB VM
  5. Nick: Change DNS
  6. Jing/Dave: Check and fix DataONE services

I have started on cn-stage-orc-1 and cn-sandbox-orc-1 but they are not ready yet.

taojing2002 commented 2 years ago

I just talked with Nick. We will first complete the sync for cn-sandbox/stage-orc-1 for testing, then do the final sync for production servers.

nickatnceas commented 2 years ago

The initial rsyncs for cn-stage-orc-1 and cn-sandbox-orc-1 are now running. They are transferring at speeds between 15-30 MB/sec, and have 1.1/1.2 TB to transfer, which will take about 22 hours at the slower 15 MB/sec speed.

We can cut the transfer time in half by deleting /var/postgres-bak on both VMs, but since it's running over the weekend, I don't think it's going to matter. Once the initial transfer is done subsequent rsyncs won't be affected much by those backup files.

I'm planning to be out on Monday Oct 18, but will be back Tuesday the 19th to do the final migrations.

taojing2002 commented 2 years ago

Thanks, Nick! Please let me know before the final migration so I can shut down all services there.

On 10/15/21 3:57 PM, Nick Outin wrote:

The initial rsyncs for cn-stage-orc-1 and cn-sandbox-orc-1 are now running. They are transferring at speeds between 15-30 MB/sec, and have 1.1/1.2 TB to transfer, which will take about 22 hours at the slower 15 MB/sec speed.

We can cut the transfer time in half by deleting |/var/postgres-bak| on both VMs, but since it's running over the weekend, I don't think it's going to matter. Once the initial transfer is done subsequent rsyncs won't be affected much by those backup files.

I'm planning to be out on Monday Oct 18, but will be back Tuesday the 19th to do the final migrations.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataONEorg/dataone/issues/13#issuecomment-944794064, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5QQDEIC6AZUPQBUI73NJTUHCWXJANCNFSM5FUIYRAA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

nickatnceas commented 2 years ago

cn-sandbox-orc-1 is done. DNS has been changed from 160.36.13.152 to 128.111.85.161.

nickatnceas commented 2 years ago

cn-stage-orc-1 is done. DNS has been changed from 160.36.13.151 to 128.111.85.167.

nickatnceas commented 2 years ago

Two more done:

search-orc-1 DNS changed from 160.36.13.162 to 128.111.85.187 mn-orc-1 DNS changed from 160.36.13.148 to 128.111.85.183

nickatnceas commented 2 years ago

cn-orc-1 was migrated last night. All VMs from ORC are now running in the UCSB NHDC.

Poseidon, hosting the three production VMs, is planned to move to Anacapa. If it moves before the UCSB VPN is installed, ports will need to be opened in the NAT. We have a single public IP address in our NAT config, so we may need to use non-standard ports (ie port 22 for SSH is already in use, the ORC MVs may need to use port 2222, 2223, 2224, etc).

If Poseidon is moved after the VPN is installed, the VMs will get public IPs in the 128.111.196.0/23 subnet, and all traffic will be routed through the UCSB campus.

nickatnceas commented 2 years ago

I moved the three production ORC VMs from Poseidon to Aurora-HW on November 17th.

Aurora-HW is still housed in the NHDC, and is configured to run these VMs off local storage with file based VM disk images. It has more memory and disk resources and faster CPUs than Poseidon, and should not have any trouble hosting the VMs.

I moved Poseidon to Anacapa and it has been reconfigured and is now capable of running the VMs whenever we want.

Anacapa is still NAT'd, and any public facing ports will need to be port-forwarded through a single public IP address. We expect the NAT to be removed in the spring of 2022, at which point all traffic from Anacapa will be routed through the UCSB campus.