Open ggouaillardet opened 8 years ago
@ggouaillardet Thanks for looking into this.
My main goal is to see how yalla works. But I was also working with ob1/openib to reproduce results of original request refactoring developers.
I'll appreciate if you can test this code and provide the feedback!
@hjelmn @bosilca @jsquyres, I noticed that sometimes with this benchmark ob1/openib
and ob1/tcp
are hanging with both v2.x release and master. Arm’s OSU test seems to run fine. This might be a problem in the test itself, but I haven't found it yet so it might worth to run it in different environments to find the root cause.
FWIW I never saw yalla to hang on this test.
I'm using open-mpi/ompi@267821f as master and open-mpi/ompi-release@87a79f5 as v2.x in case you may be aware of things that may cause this.
@jsquyres @jladd-mlnx @hppritcha @bosilca
Do we want to move this somewhere in OMPI? This is not a production grade but I'm sure we can improve it. @bosilca had some improvement ideas so it makes sense to put it into some public location. @thananon was using it few time as far as I recall (https://github.com/open-mpi/ompi/issues/2067#issuecomment-251486051) and I'll use it again if I'll need to deal with multi-thread cases again. I'd be happy to package this to the OMPI or whatever other public repo you think it would fit. I don't want this work to go nowhere even though it wasn't much of an effort.
We can certainly add one more threaded test in the ompi-tests repo.
@artpol84 i checked the minutes of the telcon, and then checked your benchmark.
are you using
mtl/mxm
orbtl/openib
?in the case of
btl/openib
, i would not expect any gain when increasing the number of threads, since my understanding is btl will serialize send and tag/communicator matching on the receive side.i do not know about
mtl/mxm
, it might be able to do thread parallelism since you are using one tag per thread.@hjelmn @bosilca @jsquyres does this make sense ?