go-hep / hep

hep is the mono repository holding all of go-hep.org/x/hep packages and tools
https://go-hep.org
BSD 3-Clause "New" or "Revised" License
230 stars 35 forks source link

xrootd/cmd/xrd-cp: sub-par performances #399

Open sbinet opened 5 years ago

sbinet commented 5 years ago

trying to copy the following file:

$>  xrd-ls -l root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleElectron.root
-r--r--r--   1841440760  Oct 16 16:39    /eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleElectron.root

results in:

$> time xrd-cp root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleElectron.root go.root

real    15m49.907s
user    0m23.006s
sys 0m32.620s

while, with the C++ version, I got:

$>  time xrdcp root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleElectron.root cxx.root

[1.715GB/1.715GB][100%][==================================================][43.9MB/s]   

real    0m40.105s
user    0m0.743s
sys 0m8.754s

presumably b/c of 2 factors:

sbinet commented 5 years ago

@EgorMatirov want to give this a try?

EgorMatirov commented 5 years ago

@EgorMatirov want to give this a try?

First thing, that I have noticed: C++ version writes file by buckets of 16 MBytes while Go version uses buckets of 16 KBytes (due to https://golang.org/src/io/io.go#L391) which results in much bigger overhead.

Passing a buffer of 16MBytes results in: Go version: 816s. C++ version: 570s.

Next optimization would be reading from the server and writing to the disk simultaneously.

Something like: goroutine reads a bucket from the server and puts it to the buffered channel. another goroutine reads a bucket from the channel and writes it to the disk.

I'll give it a try.

we haven't implemented kXR_readv

To be honest, I don't see how that can speed up copying here. As far I can tell, the only difference is that it supports reading from several files, but that's unrelated here since we are copying one file. (But it looks like a good idea to check against copying several files later).

sbinet commented 5 years ago

we haven't implemented kXR_readv

To be honest, I don't see how that can speed up copying here. As far I can tell, the only difference is that it supports reading from several files, but that's unrelated here since we are copying one file. (But it looks like a good idea to check against copying several files later).

this was just a base-less statement :)

First thing, that I have noticed: C++ version writes file by buckets of 16 MBytes while Go version uses buckets of 16 KBytes (due to https://golang.org/src/io/io.go#L391) which results in much bigger overhead.

nice find.

another possible avenue is to use bufio.Writer (but probably just using io.CopyBuffer may be logically equivalent)

sbinet commented 5 years ago

ok, with #401 in, we have now (with a ~170 MiB file):

C++:
real    0m1.663s
user    0m0.031s
sys 0m0.679s

Go:
real    0m1.879s
user    0m0.220s
sys 0m0.603s

testing on the root-eos public instance I get better results for Go than for C++ but I suspect there's some throttling somewhere...

C++: (1.7 GiB)
real    9m40.706s
user    0m2.759s
sys 0m15.548s

Go: (1.7 GiB)
real    2m47.093s
user    0m8.137s
sys 0m17.981s

also some memory infos, from top:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
28840 binet     20   0 1433684 141000   7192 S  21.0   0.2   0:20.66 xrd-cp
28399 binet     20   0  558244 143304  11016 S   5.3   0.2   0:08.79 xrdcp

let's leave this one open