Open jforthofer opened 8 years ago
Can we please make sure this isn't related to full disk encryption? You guys may have figured this out, but I haven't seen anything. I can't reproduce on my VM, and I didn't seem to have a terrible slowdown when using another windows machine.
There may be some slow down associated with encryption, but it's not the whole problem. The slowdown happens on Windows desktops too, which are not encrytped.
I don't see it being code related, or the VM's would suffer too. #define VSIFILEL FILE
and #define VSIFWriteL fwrite
may work as a test in the
STL converter.
On Sat, May 21, 2016 at 9:21 AM, Natalie Wagenbrenner < notifications@github.com> wrote:
There may be some slow down associated with encryption, but it's not the whole problem. The slowdown happens on Windows desktops too, which are not encrytped.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/firelab/windninja/issues/147#issuecomment-220783466
Kyle
So Kyle, are you saying a ninjafoam simulation runs in the same amount of time on your linux and your Windows vm?
Yes, I believe it did, I will check again. If I remember right, it did on your VM too. I look for the email.
On Sat, May 21, 2016 at 9:28 AM, Natalie Wagenbrenner < notifications@github.com> wrote:
So Kyle, are you saying a ninjafoam simulation runs in the same amount of time on your linux and your Windows vm?
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/firelab/windninja/issues/147#issuecomment-220783867
Kyle
From an email on 2016-02-25 from Natalie:
Yeah, I was running on a laptop.
I ran the coarse hi.tif case on my personal laptop with Ubuntu 14.04 and Windows 8.1:
Ubuntu: 48 min Windows: 44 min
On my FS laptop: 2 hours
The big difference seems to be in the meshing. Meshing is 45% of the total simulation time on Windows, but only 2% of the time on linux. MDM runs SO SLOW on the FS machine.
I'm testing to see what we can reduce the solver iterations to right now. I still can't run in parallel on any windows machine.
Is this different now?
I think that test must have been using 1 thread. Yeah, meshing speeds improved on Windows when we switched to DP. But you're right, looks like my personal Windows machine was comparable to linux speed. I dont remember my linux laptop being so slow...must be though if that's what I said. This is a weird problem that is difficult to test. Maybe it's related to the FS image. On the FS machines all i/o stuff seems slower. This includes stl creation, refineMesh, and decomposing/reconstructing the domain. On May 21, 2016 9:35 AM, "Kyle Shannon" notifications@github.com wrote:
From an email on 2016-02-25:
Yeah, I was running on a laptop.
I ran the coarse hi.tif case on my personal laptop with Ubuntu 14.04 and Windows 8.1:
Ubuntu: 48 min Windows: 44 min
On my FS laptop: 2 hours
The big difference seems to be in the meshing. Meshing is 45% of the total simulation time on Windows, but only 2% of the time on linux. MDM runs SO SLOW on the FS machine.
I'm testing to see what we can reduce the solver iterations to right now. I still can't run in parallel on any windows machine.
Is this different now?
— You are receiving this because you commented.
Reply to this email directly or view it on GitHub https://github.com/firelab/windninja/issues/147#issuecomment-220784245
I think that test must have been using 1 thread.
It was, it was elsewhere in the email.
Yeah, meshing speeds improved on Windows when we switched to DP. But you're right, looks like my personal Windows machine was comparable to linux speed. I dont remember my linux laptop being so slow...must be though if that's what I said. This is a weird problem that is difficult to test. Maybe it's related to the FS image. On the FS machines all i/o stuff seems slower. This includes stl creation, refineMesh, and decomposing/reconstructing the domain.
Could anti-virus software be doing it too? Can we disable that for testing?
@jforthofer
I have a feeling that this may have to do with using GDAL's VSI functions rather than standard library functionality.
You realize that VSI is being run on both platforms, right? The wrappers are slightly different on VSIFWriteL()
, but unix has an extra if
statement before fwrite()
is called.
I noticed there is a buffer size set in the DEM to STL writing, maybe adjusting this would help?
Can you point me to this, I can't find it.
Maybe the VSI stuff just has too much overhead?
Again, the overhead is equal, if not more on unix.
Why would It be slow on Windows but not Linux... not sure.
It appears all the OpenFOAM writing is taking extra time too, and they aren't using VSI (last time I checked). I don't think VSI is the right place to look, and I think it's a red herring for you.
It would be good to do some tests to see if this is the source of the problem or not. I think we have a stand alone DEM to STL converter, so that might be a good isolated place to check out.
Testing on different systems, and documenting performance should be done first. On this ticket ideally.
I'll see about disabling anti-virus for testing.
Also, just for completeness, the times I reported in Feb for hi.tif were slower than typical coarse cases, because that case was failing near the end of the simulation, coarsening and rerunning. I modified the meshing after that, so that type of failure should not happen anymore.
We should just rerun the previous run time comparisons with our current code. On May 21, 2016 10:02 AM, "Kyle Shannon" notifications@github.com wrote:
I think that test must have been using 1 thread.
It was, it was elsewhere in the email.
Yeah, meshing speeds improved on Windows when we switched to DP. But you're right, looks like my personal Windows machine was comparable to linux speed. I dont remember my linux laptop being so slow...must be though if that's what I said. This is a weird problem that is difficult to test. Maybe it's related to the FS image. On the FS machines all i/o stuff seems slower. This includes stl creation, refineMesh, and decomposing/reconstructing the domain.
Could anti-virus software be doing it too? Can we disable that for testing?
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/firelab/windninja/issues/147#issuecomment-220785728
Benchmarks for bigbutte_small.tif, coarse, 10mph, 225 deg (from email 2016-04-09).
Kyle Ubuntu 14.04, Intel i5 Quad Core 2.2 GHz (SSD): 7m40s Ubuntu 14.04, Intel Xeon Dual 8-core 2.4 GHz: 4m06s Windows 7, Intel Core i7-2630 8-core, 2.0 GHz: 15m24s Windows 7 Client on Ubuntu qemu Host, 2 cores: 22m48s
Natalie benchmarks from email above, specifics unknown:
Intel i5-4200u, 1.6 GHz (base), 2.29 GHz (max):
Ubuntu 14.04: 8.5 min Windows 8.1: 24 min
On Sat, May 21, 2016 at 10:16 AM, Natalie Wagenbrenner notifications@github.com wrote:
I'll see about disabling anti-virus for testing.
Also, just for completeness, the times I reported in Feb for hi.tif were slower than typical coarse cases, because that case was failing near the end of the simulation, coarsening and rerunning. I modified the meshing after that, so that type of failure should not happen anymore.
We should just rerun the previous run time comparisons with our current code.
Yes, I agree.
On May 21, 2016 10:02 AM, "Kyle Shannon" notifications@github.com wrote:
I think that test must have been using 1 thread.
It was, it was elsewhere in the email.
Yeah, meshing speeds improved on Windows when we switched to DP. But you're right, looks like my personal Windows machine was comparable to linux speed. I dont remember my linux laptop being so slow...must be though if that's what I said. This is a weird problem that is difficult to test. Maybe it's related to the FS image. On the FS machines all i/o stuff seems slower. This includes stl creation, refineMesh, and decomposing/reconstructing the domain.
Could anti-virus software be doing it too? Can we disable that for testing?
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/firelab/windninja/issues/147#issuecomment-220785728
— You are receiving this because you commented. Reply to this email directly or view it on GitHub
Kyle
To narrow it down a bit, I compared just stl_create for big_butte_small.tif (as requested by Jason).
Windows 7 FS laptop (from Audrey), non-encrypted, P8700 2.5 GHz: 13 s Windows 7 FS laptop (mine), encrypted, i7-2620M 2.7 GHz: 8 s Ubuntu 14.04 (my desktop), X5677 3.47 GHz: < 1 s
So this is good that the problem has been narrowed down to the kind of stuff done in stl_create. One thing to note is that I don't believe this is compiled during the cross-compile part, it is compiled on Windows natively using Visual Studio, right? This eliminates the possibility of problems in the cross compile (like linking to different C++ runtime libraries).
stl_create is WindNinja code. It's not cross-compiled.
Is stl creation the only thing slowed down? I thought everything was running slower, the OpenFOAM stuff included.
No, it's not just stl creation. It seems to be I/O things that are slower, which includes a lot of OpenFOAM stuff (e.g., domain decomp/reconstruction, refineMesh, and other things...).
To narrow it down a bit, I compared just stl_create for big_butte_small.tif (as requested by Jason).
Windows 7 FS laptop (from Audrey), non-encrypted, P8700 2.5 GHz: 13 s Windows 7 FS laptop (mine), encrypted, i7-2620M 2.7 GHz: 8 s Ubuntu 14.04 (my desktop), X5677 3.47 GHz: < 1 s
Can we do the same with antivirus turned off? Was stl_create compiled in release or debug mode?
No, it's not just stl creation. It seems to be I/O things that are slower, which includes a lot of OpenFOAM stuff (e.g., domain decomp/reconstruction, refineMesh, and other things...).
So we have two completely separate processes, built with two completely separate runtimes, built with two completely different compilers, using two separate APIs suffering the same issue. OpenFOAM likely uses C++ streams for I/O. GDAL uses POSIX API on unix, Windows C API on windows. To me, this points to an external factor, not code. But maybe that is already agreed upon, I can't tell. I changed the ticket name because it didn't make any sense to me.
Kyle, the buffer I was talking about earlier is here: https://github.com/firelab/windninja/blob/master/src/ninja/stl_create.cpp#L236
I think Natalie is going to replace this code with C++ standard library equivalent code to see if things speed up. Another thing to test is to recode this with the VSI stuff included but put everything into the buffer instead of small buffer writes. Last, if none of this has any effect, we need to test with antivirus off. The only reason we're doing this stuff in this order is because Natalie thought it would be fastest (might be a bit of work for us to get an antivirus-off computer together).
Kyle, the buffer I was talking about earlier is here: https://github.com/firelab/windninja/blob/master/src/ninja/stl_create.cpp#L236
Yes, that would help by calling fsync() less, the underlying buffer is probably 1024, 2048 or 4096 bytes or somewhere along those lines, but you have to be very careful about how you write binary files. You can't just call:
typedef struct s {
short a;
float b;
double c;
} s;
// ...
s m;
fwrite(&m, sizeof( s ), 1, fout);
without a properly packed struct, or modifying how the compiler handles structs. That is probably why it's currently done the way it's done. It does appear that our struct is properly aligned (homogeneous members), so it might work. Another option would be to dump the struct and just use one big array of floats.
I think Natalie is going to replace this code with C++ standard library equivalent code to see if things speed up. Another thing to test is to recode this with the VSI stuff included but put everything into the buffer instead of small buffer writes. Last, if none of this has any effect, we need to test with antivirus off. The only reason we're doing this stuff in this order is because Natalie thought it would be fastest (might be a bit of work for us to get an antivirus-off computer together).
If you #define
some stuff, it should be a couple of lines in stl_create.cpp, at least to use C stdio functions.
I think you will get almost no performance boost out of it, because (I assume) OpenFOAM uses std::streams, but wtf do I know. You guys do what you want, I'm out on this one.
VSI calls for windows:
https://github.com/ksshannon/gdal/blob/trunk/gdal/port/cpl_vsil_win32.cpp#L308-L326
for unix:
https://github.com/ksshannon/gdal/blob/trunk/gdal/port/cpl_vsil_unix_stdio_64.cpp#L398-L440
For another (non-stl_create) comparison, below are times for decomposePar -force
on big_butte_small.tif (in WindNinja). Decomposing to 4 processors.
my Linux desktop (same as above): 11 s my Windows laptop (same as above): 1 min 24 s
During decomposePar polyMesh/*, U, k, p, and epsilon are copied over to new directories (one for each processor). OpenFOAM calls regIOobject::write().
Still not sure why we're seeing this slowdown in stl_create (which uses VSIFWriteL()) and also in the OpenFOAM code (which uses std::ofstream), but not in other types of file writing we do (e.g., writing kmzs). Or why the behavior would be different on Windows vs. Linux. But it's definitely related to I/O. There are Windows specific changes related to I/O in our patched version, e.g.:
I also see OFstream::debug is turned on for Windows but not Linux:
https://github.com/firelab/OpenFOAM-2.2.x/blob/master/v3-mingw-openfoam-2-2-x.patch#L24591-24592
Maybe the first thing to try is flipping off the debug switch in the Windows build to see if that has an effect on speed.
Unsetting the OFStream debug switch did not affect the run time for decomposePar. It did prevent a bunch of garbage from being spewed out.
I changed VSIFWriteL to fwrite in stl_create and re-ran the conversion for big_butte_small.tif on my Linux laptop and my Windows laptop. It now takes less than 1 second on both. So it appears something is going on in VSIFWriteL that is affecting performance on Windows. Still don't know what is causing the slowdown for OpenFOAM file writing on Windows.
I still don't think it's the VSI layer, it is probably the use of the native win32 API, which flushes at each call:
http://stackoverflow.com/questions/14290337/is-fwrite-faster-than-writefile-in-windows
I think the slow OpenFOAM processes on Windows may be related to our CPLSpawn calls.
https://github.com/firelab/windninja/blob/master/src/ninja/ninjafoam.cpp#L1430
The output is redirected to log files and some of them get pretty big. CPLSpawn uses VSIFWriteL. I'm looking into it.
Are the times for decomposePar above while running in WindNinja, or standalone?
The times previously reported were for decomposePar while running in WindNinja.
Standalone, decomposePar runs in 8 seconds on my windows laptop.
Same with surfaceTransformPoints and blockMesh. They run much faster standalone than inside of WindNinja on Windows.
Okay, I misunderstood. I thought those were run standalone.
I am not sure, but I don't think there is a way around the CPLSpawn issue without writing a process library. We could just cut/paste GDAL's and use it, but change the functions to take FILE*.
Sorry for the confusion. Before I was just reporting times spit out at the end of the WindNinja simulation. Today was the first time I really compared standalone vs. via WindNinja. I just didn't really consider CPLSpawn being an issue until today.
Likely the same VSIFWriteL issue, unfortunately (for so many reasons). CPLSpawn is calling it, but I think it might be buffered, but apparently not much. As a test, maybe pass NULL to the processes for the stdout, and if it runs faster, that's it. I assume that's how you handle progress, reading the logs, so that will be messed up, but the speed should change, I think.
Yeah, that's what I tried today, but CPLSpawn didn't like NULL for the stdout. I tried it just for the surfaceTransformPoints call on linux and it just kept failing. I'll look at it closer tomorrow.
You'll probably have to comment out all of your reads as well. I won't armchair from here...
What do mean "reads"? You mean related to parsing the stdout for progress? Yeah, I can do that later... But do you see why passing NULL as the fout parameter would cause a problem in SurfaceTransformPoints?
https://github.com/firelab/windninja/blob/master/src/ninja/ninjafoam.cpp#L1424
I don't see why that should be an issue, but it is.
It shouldn't be. I just thought of a fix. I'm riding. I can implement by tomorrow night. Maybe tonight. On Jun 13, 2016 5:35 PM, "Natalie Wagenbrenner" notifications@github.com wrote:
What do mean "reads"? You mean related to parsing the stdout for progress? Yeah, I can do that later... But do you see why passing NULL as the fout parameter would cause a problem in SurfaceTransformPoints?
https://github.com/firelab/windninja/blob/master/src/ninja/ninjafoam.cpp#L1424
I don't see why that should be an issue, but it is.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/firelab/windninja/issues/147#issuecomment-225739258, or mute the thread https://github.com/notifications/unsubscribe/AAwVDUoiii5tAmzbLZTxszJzHQCiCZZyks5qLekkgaJpZM4IjnYj .
Natalie,
It might take a little work to get a proper buffered writer installed in VSI. In the mean time, you can write logs to in-memory files using filenames such as /vsimem/somename
or /vsimem//home/with/abs/path
. They will be held in memory, and before closing, call CopyFile() on them to the correct file path. I'll look into providing a buffered writer, or another option.
I'm starting to think this is not related to CPLSpawn after all. Previous times reported on this ticket may have been before we discovered the power settings issue on Windows (everything was running much slower then). At any rate, decomposePar and surfaceTransformPoints now seem to run at about the same speed on Windows, whether standalone or via WindNinja. For clarification, here are the execution times for decomposePar on big_butte_small.tif:
Windows standalone: 9.35s Windows via WindNinja: 9.96s
Now I think there may be something specific to our cross-compiled OF slowing things down. See execution times for checkMesh on big_butte_small.tif for our cross-compiled OF, BlueCFD's OF for Windows, and Linux (all standalone from the command line):
our cx-compiled OF (my Windows laptop): 24.85s BlueCFD OF (my Windows laptop): 10.75s Linux: 4.35s
Again, sorry for the confusion. There have been several issues slowing down performance on Windows (STL writing issue related to WriteFile(), Windows power settings throttling processes, OF SP vs. DP build) and that has complicated things.
On Tue, Jun 14, 2016 at 2:26 PM, Natalie Wagenbrenner < notifications@github.com> wrote:
I'm starting to think this is not related to CPLSpawn after all. Previous times reported on this ticket may have been before we discovered the power settings issue on Windows (everything was running much slower then). At any rate, decomposePar and surfaceTransformPoints now seem to run at about the same speed on Windows, whether standalone or via WindNinja. For clarification, here are the execution times for decomposePar on big_butte_small.tif:
Windows standalone: 9.35s Windows via WindNinja: 9.96s
Now I think there may be something specific to our cross-compiled OF slowing things down. See execution times for checkMesh on big_butte_small.tif for our cross-compiled OF, BlueCFD's OF for Windows, and Linux (all standalone from the command line):
our cx-compiled OF (my Windows laptop): 24.85s BlueCFD OF (my Windows laptop): 10.75s Linux: 4.35s
That is still pretty crappy win32 Blue vs linux.
Again, sorry for the confusion. There have been several issues slowing down performance on Windows (STL writing issue related to WriteFile(), Windows power settings throttling processes, OF SP vs. DP build) and that has complicated things.
No worries, it's a big ticket. Should I not worry about implementing a buffered reader? I can try the vsimem method, low effort, and it should help if it is the problem.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/firelab/windninja/issues/147#issuecomment-226005521, or mute the thread https://github.com/notifications/unsubscribe/AAwVDYQe4ZineRlIlkWk8u6iNuFX7Kliks5qLw6OgaJpZM4IjnYj .
Kyle
I don't think you need to worry about a buffered reader. And I did try the vsimem method you mentioned and it didn't seem to improve the speed at all. Thanks though.
The new CFD simulations are much slower on Windows than Linux. It seems especially slow at the beginning of the simulations when a lot of file writing is happening (converting DEM to STL, reading/writing/copying OpenFOAM dict files, etc.). I have a feeling that this may have to do with using GDAL's VSI functions rather than standard library functionality. I noticed there is a buffer size set in the DEM to STL writing, maybe adjusting this would help? Maybe the VSI stuff just has too much overhead? Why would It be slow on Windows but not Linux... not sure. It would be good to do some tests to see if this is the source of the problem or not. I think we have a stand alone DEM to STL converter, so that might be a good isolated place to check out.