ICRAR / ngas

The NGAS storage system
GNU Lesser General Public License v3.0
23 stars 13 forks source link

Investigate how/if bbcp allows to select network paths for data transfers #21

Open rtobar opened 4 years ago

rtobar commented 4 years ago

During the work done in #19 it was found by @gsleap that no matter the combination of parameters given to bbcp, it apparently always resorted to using hostnames (and fully qualified domain names) as the sole basis for establishing the connection between the source and the target nodes. This is not enough for certain scenarios, where two machines have multiple, independent networks paths that can be taken depending on the interface being addressed.

Consider the following scenario, which is similar to the deployment used in the tests described in #19:

          Host A                                        Host B
      +-------------+            1 Gb              +-------------+
      |        eth0 |<---------------------------->| eth0        |
      | IP: 1.1.1.1 |                              | IP: 1.1.1.2 |
      |             |                              |             |
      |             |            10 Gb             |             |
      |        eth1 |<---------------------------->| eth1        |
      | IP: 2.2.2.2 |                              | IP: 2.2.2.3 |
      |             |                              |             |
      |             |                              |             |
      | NGAS client |                              | NGAS server |
      |    bbcp SRC |                              | bbcp SINK   |
      +-------------+                              +-------------+

The NGAS server running in B is listening on eth1, the 10 Gb interface. When the BBCPARC command comes in from A we generate and execute this command in B:

bbcp .... 2.2.2.2:/path/to/source/file 2.2.2.3:/path/to/ngas/staging/file

By using the specific IPs in the source and target specifications we expect bbcp to explicitly use the 10 Gb interface for the data transfer.

The command starts the SRC and SINK copies of bbcp in A and B respectively. bbcp however seems to use exclusively the hosts' names as the main bit of information to establish the connection between SRC and SINK, and because A and B resolve to the 1.1.1.X addresses, the 1Gb link is used for the bbcp data transfers. This behavior seems to be same regardless of the direction of the connection establishment (i.e. the -z option) and whether name resolution (i.e., the -n option) is used, but this should be tested thoroughly.

This problem was initially investigated in #19, but then it was decoupled into a new issue to separate it from the original problem reportedin #19, which has been fixed.

rtobar commented 4 years ago

After reading some more code, I think I understand better what is going on, and how to go about it. The problem was indeed that bbcp tries very hard to use each node's hostname as the main bit of information for establishing the data channels. I just tried some changes to bbcp locally and I could get use the interface I wanted (I tried in my laptop contacting NGAS through the network interface, which started bbcp and had it exchange data through the same interface). I will now expose those changes but through a new option (so old behavior is left unchanged), and will try again. Stay tuned...

rtobar commented 4 years ago

I've implemented a new -j command-line option in bbcp that should make it prefer using the hostname/IPs given in the file specifications in the command line instead of the hostnames of the nodes involved in the data exchange. Again, I tried this locally in my laptop by forcing bbcp to use my ethernet interface for the data exchange instead of the loopback interface, and it seems to work.

@gsleap could you give this a try when you have some time? Make sure you have the latest master version of bbcp from https://github.com/ICRAR/bbcp in both machines. Then go with the following on mwacache10 (has -j, but doesn't have -z):

bbcp -j -f -V -n -S "ssh -x -a -oBatchMode=yes -oGSSAPIAuthentication=no -oFallBackToRsh=no %4 %I -l %U %H bbcp" -e -E c32c -s 12 -P 2 mwa@192.168.120.204:/data/20191210/rawdump_1260043216.raw 192.168.120.110:/home/mwa/NGAS/volume2/staging/NGAMS_TMP_FILE___3airu2emrawdump_1260043216.raw.fits

Hopefully this will take us in the right direction. I experienced some slowness while the connection was actually being established between the SRC and SNK copies of bbcp, and while I didn't stop to find out what was causing it I'm hoping it's something more to do with my setup and environment, and the fact that both copies are in the same computer in my case, than with the actual changes I did to the software.

gsleap commented 4 years ago

Hi Rod,

Awesome, thanks for that. That did the trick- the bbcp was successful and without any real effort, achieved sustained speed of ~ 6 Gbps!

The only issue (well not issue, more of a nit pick) is that the final stats output shows: Target 127.0.1.1 using a final recv window of 3137568 Source 127.0.1.1 using a final send window of 6256640

(using 127.0.1.1 which is ubuntu's own internal DNS ip, rather than the IP's involved in the transfer)

No big deal though.

Thanks again!

Greg


From: rtobar notifications@github.com Sent: Wednesday, 12 February 2020 1:53 PM To: ICRAR/ngas ngas@noreply.github.com Cc: Greg Sleap greg.sleap@curtin.edu.au; Mention mention@noreply.github.com Subject: Re: [ICRAR/ngas] Investigate how/if bbcp allows to select network paths for data transfers (#21)

I've implemented a new -j command-line option in bbcp that should make it prefer using the hostname/IPs given in the file specifications in the command line instead of the hostnames of the nodes involved in the data exchange. Again, I tried this locally in my laptop by forcing bbcp to use my ethernet interface for the data exchange instead of the loopback interface, and it seems to work.

@gsleaphttps://github.com/gsleap could you give this a try when you have some time? Make sure you have the latest master version of bbcp from https://github.com/ICRAR/bbcp in both machines. Then go with the following on mwacache10 (has -j, but doesn't have -z):

bbcp -j -f -V -n -S "ssh -x -a -oBatchMode=yes -oGSSAPIAuthentication=no -oFallBackToRsh=no %4 %I -l %U %H bbcp" -e -E c32c -s 12 -P 2 mwa@192.168.120.204:/data/20191210/rawdump_1260043216.raw 192.168.120.110:/home/mwa/NGAS/volume2/staging/NGAMS_TMP_FILE___3airu2emrawdump_1260043216.raw.fits

Hopefully this will take us in the right direction. I experienced some slowness while the connection was actually being established between the SRC and SNK copies of bbcp, and while I didn't stop to find out what was causing it I'm hoping it's something more to do with my setup and environment, and the fact that both copies are in the same computer in my case, than with the actual changes I did to the software.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ICRAR/ngas/issues/21?email_source=notifications&email_token=AE2L5FU7H5VUAYWNDHFFZ33RCOFGHA5CNFSM4KO6V2TKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELPQWGQ#issuecomment-585042714, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE2L5FTFIQSY6F7ZA4IZMJTRCOFGHANCNFSM4KO6V2TA.

rtobar commented 4 years ago

That's great news! I'll then put some effort into getting these changes upstreamed to the original bbcp maintainer, and to make the corresponding changes in NGAS to ensure we use the -j option when available.