chime-experiment / alpenhorn

alpenhorn manages files produced by the acquisition system. This alpenhorn is mostly legacy these days. Most alpenhorn development is going on in alpenhorn2: https://github.com/radiocosmology/alpenhorn/
0 stars 0 forks source link

bbcp bad state is bad #14

Closed ketiltrout closed 3 years ago

ketiltrout commented 3 years ago

Sometimes bbcp transfers really slowly (<= few Mbps) when pulling to cedar. Not sure why yet.

Handy link: https://www.slac.stanford.edu/~abh/bbcp/

ketiltrout commented 3 years ago

Good transfer:

Apr 26 17:10:45 INFO >> Transferring file "20210325T191254Z_chime_rawadc/000191.h5".
Apr 26 17:10:51 INFO >> Pull complete (md5sum correct). Transferred 261.3 MB in 6 seconds [41.3 MB/s]

  cmd: bbcp -V -f -z --port 4200 -W 4M -s 16 -o -E md5= alpenhorn@206.12.116.2:/mnt/gong/archive/20210325T191254Z_chime_rawadc/000191.h5 /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/
  ret: 0
  stderr:
    bbcp: Window size reduced to 425984 bytes.
    bbcp: Warning: cedar5.cedar.computecanada.ca is running an older version of bbcp
    bbcp: cedar5.cedar.computecanada.ca running version 14.04.14.00.1
    bbcp: hac2 running version 17.12.00.00.0
    Target cedar5.cedar.computecanada.ca using initial recv window of 235104
    Source hac2 using initial send window of 4194304
    bbcp: Creating /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000191.h5
    Source cpu=0.245 (sys=0.230 usr=0.015).
    Checksum: md5 512e0b35ba760c175cbebd06f4bfb39a cedar5.cedar.computecanada.ca:/project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000191.h5
    File /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000191.h5 created; 274041056 bytes at 62.3 MB/s
    244 buffers used with 305 reorders; peaking at 30.
    Target cpu=2.295 (sys=1.476 usr=0.819).
    Target cedar5.cedar.computecanada.ca using a final recv window of 4748832
    Source hac2 using a final send window of 4194304
    1 file copied at effectively 42.7 MB/s
ketiltrout commented 3 years ago

Good transfer:

Apr 26 17:10:51 INFO >> Transferring file "20210325T191254Z_chime_rawadc/000192.h5".
Apr 26 17:11:02 INFO >> Pull complete (md5sum correct). Transferred 261.3 MB in 10 seconds [25.3 MB/s]
  cmd: bbcp -V -f -z --port 4200 -W 4M -s 16 -o -E md5= alpenhorn@206.12.116.2:/mnt/gong/archive/20210325T191254Z_chime_rawadc/000192.h5 /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/
  ret: 0
  stderr: 
    bbcp: Window size reduced to 425984 bytes.
    bbcp: Warning: cedar5.cedar.computecanada.ca is running an older version of bbcp
    bbcp: cedar5.cedar.computecanada.ca running version 14.04.14.00.1
    bbcp: hac2 running version 17.12.00.00.0
    Target cedar5.cedar.computecanada.ca using initial recv window of 235104
    Source hac2 using initial send window of 4194304
    bbcp: Creating /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000192.h5
    Source cpu=0.222 (sys=0.208 usr=0.014).
    Checksum: md5 fd107689e48aa8467c0c98843f08c837 cedar5.cedar.computecanada.ca:/project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000192.h5
    File /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000192.h5 created; 274041056 bytes at 65.6 MB/s
    244 buffers used with 229 reorders; peaking at 47.
    Target cpu=2.421 (sys=1.504 usr=0.917).
    Target cedar5.cedar.computecanada.ca using a final recv window of 2986008
    Source hac2 using a final send window of 4194304
    1 file copied at effectively 44.3 MB/s
ketiltrout commented 3 years ago

Bad transfer:

Apr 26 17:43:12 INFO >> Transferring file "20210325T191254Z_chime_rawadc/000193.h5".
Apr 26 18:11:21 INFO >> Pull complete (md5sum correct). Transferred 263.4 MB in 1688 seconds [0.2 MB/s]
  cmd: bbcp -V -f -z --port 4200 -W 4M -s 16 -o -E md5= alpenhorn@206.12.116.2:/mnt/gong/archive/20210325T191254Z_chime_rawadc/000193.h5 /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/
  ret: 0
  stderr:
    bbcp: Window size reduced to 425984 bytes.
    bbcp: Warning: cedar5.cedar.computecanada.ca is running an older version of bbcp
    bbcp: cedar5.cedar.computecanada.ca running version 14.04.14.00.1
    bbcp: hac2 running version 17.12.00.00.0
    Target cedar5.cedar.computecanada.ca using initial recv window of 235104
    Source hac2 using initial send window of 4194304
    bbcp: Creating /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000193.h5
    Source cpu=0.330 (sys=0.317 usr=0.013).
    Checksum: md5 ba9dd6951db2490875522f26efc94d96 cedar5.cedar.computecanada.ca:/project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000193.h5
    File /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000193.h5 created; 276164832 bytes at 159.9 KB/s
    244 buffers used with 316 reorders; peaking at 68.
    Target cpu=2.093 (sys=1.286 usr=0.807).
    Target cedar5.cedar.computecanada.ca using a final recv window of 11823872
    Source hac2 using a final send window of 4194304
    1 file copied at effectively 159.7 KB/s
ketiltrout commented 3 years ago

Bad transfer:

Apr 26 18:38:33 INFO >> Transferring file "20210325T191254Z_chime_rawadc/000195.h5".
Apr 26 18:39:20 INFO >> Pull complete (md5sum correct). Transferred 259.3 MB in 47 seconds [5.5 MB/s]
  cmd: bbcp -V -f -z --port 4200 -W 4M -s 16 -o -E md5= alpenhorn@206.12.116.2:/mnt/gong/archive/20210325T191254Z_chime_rawadc/000195.h5 /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/
  ret: 0
  stderr: 
    bbcp: Window size reduced to 425984 bytes.
    bbcp: Warning: cedar5.cedar.computecanada.ca is running an older version of bbcp
    bbcp: cedar5.cedar.computecanada.ca running version 14.04.14.00.1
    bbcp: hac2 running version 17.12.00.00.0
    Target cedar5.cedar.computecanada.ca using initial recv window of 235104
    Source hac2 using initial send window of 4194304
    bbcp: Creating /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000195.h5
    Source cpu=0.262 (sys=0.245 usr=0.017).
    Checksum: md5 29499614e19505b807eab5fe54ab0e7d cedar5.cedar.computecanada.ca:/project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000195.h5
    File /project/rpp-chime/chime/chime_staging/20210325T191254Z_chime_rawadc/000195.h5 created; 271917280 bytes at 5.9 MB/s
    244 buffers used with 288 reorders; peaking at 32.
    Target cpu=2.208 (sys=1.498 usr=0.710).
    Target cedar5.cedar.computecanada.ca using a final recv window of 8533760
    Source hac2 using a final send window of 4194304
    1 file copied at effectively 5.6 MB/s
ketiltrout commented 3 years ago

I managed to capture the bbcp debug output from both a bad (123 sec) and a good (8 sec) transfer of two identically-sized rawadc files. I don’t see anything substantially different between the two. All the buffers and window sizes &c. are the same, so I don’t think bbcp autotuning is to blame. I’ve turned off the debug output.

ketiltrout commented 3 years ago

Pretty sure the ordering flag was to blame for all of this. (Removed in #15