Closed serialhex closed 6 years ago
Good catch. Can you do me a big favor and hunt down the block of code that's causing the internal slowdown? By virtue of it's tiny-ness, you should only need to look at 3-4 areas with scoped timers to find the offending function if you don't have any profiling tools handy.
class scoped_timer
{
std::string message;
std::chrono::high_resolution_clock::time_point t0;
public:
scoped_timer(std::string message) : message{ std::move(message) }, t0{ std::chrono::high_resolution_clock::now() } {}
~scoped_timer()
{
std::cout << message << " completed in " << std::to_string((std::chrono::duration<float>(std::chrono::high_resolution_clock::now() - t0).count() * 1000)) << " ms" << std::endl;
}
};
#define PLY_SCOPED_TIMER(MESSAGE) scoped_timer scoped_timer_ ## __LINE__(MESSAGE)
Hey, nice little class & macro! I'm taking that & adding it to my toolkit! :-D
Anyway, I've taken the first 200 lines of output (as the file is ~4GB and I didn't let it finish) along with a diff of the changes I made so you can see where I'm calling the scoped timer from. I made it a gist https://gist.github.com/serialhex/7f729c641f47f58c3e174b2c67e8d851 so I don't clutter the messages here. Let me know if there are any other details you would like.
(Yes, ~4GB. We're making models with ~14 million points, so they are a little big)
Interesting, thanks! I'll take a deeper look over the weekend and let you know what I find.
@serialhex I'm seeing around a 10 x slowdown between plyfile.write(...)
(1gb takes around 50 seconds) and standard ostream binary write (5 seconds). Also keep in mind disk bandwidth: I'm writing to a Samsung Evo Pro SSD. One other difference in my ostream binary test is I'm allocating a 1gb vector and using a single .write(...)
call, instead of the many used internally to tinyply -- this will undoubtedly have an effect.
One obvious thing -- can you make sure you are compiling in Release (-03) mode? It's tremendously slow in debug (easily 20x on my machine).
I did identify one particularly slow call, https://github.com/ddiakopoulos/tinyply/blob/master/source/tinyply.cpp#L223 When I optimize the way this works, the plyfly.write(...)
1gb test completes in ~20 seconds, so around a 2.5x improvement. It's still around 4x slower over the vanilla ostream but this seems much more reasonable.
I too am noticing slowdown for large files. Will you be committing the optimization of the slow call you mentioned above? Thanks for the tool, super useful :)
Hi @ataber - sure, I just pushed it on a branch: https://github.com/ddiakopoulos/tinyply/tree/writefix
I didn't test it on any real data though (which is one reason I haven't pushed it yet!)
Just watched this great talk from cppcon, where the speaker mentions that std::endl
flushes the stream...after searching thru my code, I found that this library uses std::endl
a lot (in fact after every line of the PLY body - see here). I'm testing a change on my machine that replaces std::endl
with "\n"
- do you have any data that you typically benchmark on?
hi you can test with these files https://marun.uk.to/owncloud/s/s0vLdYWaE3ADbRa
👍 for avoiding std::endl
until the very end.
+1 - @ataber can you follow up with some benchmarks? I'm aware of the flushing penalty and also that the replacement to \n
is trivial, but I'd be curious :)
Thanks for the patience - you know how it is with open source :)
Here's some benchmark data for writing an ASCII file with just vertex and triangular facet data:
Commit: f6a8c5d069e56118eac40d268db93fdb5eb58035 (with "\n"
replacing std::endl
)
Number of Vertices | Number of Facets | CPU time | Wall time
------------| ----- | --------- | ----
196610 | 393216 | 1.8054 seconds | 2.2114 seconds
1066690 | 2133376 | 10.1649 seconds | 12.441 seconds
191156 | 382472 | 1.49407 seconds | 1.86911 seconds
286914 | 573824 | 2.59925 seconds | 3.17874 seconds
331628 | 663252 | 3.15947 seconds | 3.8565 seconds
145108 | 290946 | 1.24691 seconds | 1.51113 seconds
634220 | 1376528 | 7.15963 seconds | 8.5375 seconds
127234 | 254464 | 1.11448 seconds | 1.36364 seconds
2308760 | 4617580 | 23.0025 seconds | 28.2147 seconds
314607 | 630334 | 3.00653 seconds | 3.63565 seconds
4410859 | 8821758 | 37.4868 seconds | 47.9383 seconds
169448 | 334592 | 1.22074 seconds | 1.47941 seconds
(ugh markdown won't display it right, but you get the idea hopefully)
Commit: be7f9dea48f066949c2b734b18f302c4b11214cd (with std::endl
)
Number of Vertices | Number of Facets | CPU time | Wall time
---- | ---- | ---- | ----
196610 | 393216 | 12.4065 seconds | 112.37 seconds
1066690 | 2133376 | 71.3297 seconds | 602.803 seconds (had to switch to lower-resolution dataset after this datapoint as it was taking too long)
3074 | 6144 | 0.228811 seconds | 2.0576 seconds
21532 | 43060 | 2.2884 seconds | 13.3025 seconds
102259 | 204678 | 6.14775 seconds | 60.0891 seconds
2282 | 4560 | 0.262724 seconds | 1.59634 seconds
7962 | 15920 | 0.655999 seconds | 6.29141 seconds
10073 | 20858 | 1.03503 seconds | 8.72427 seconds
12208 | 26028 | 1.00501 seconds | 9.09585 seconds
1327 | 2650 | 0.104316 seconds | 0.893529 seconds
Thanks for the great library!
I'm not sure what kinds of transformations on the data you're doing as it's being written, it doesn't look like much, but for whatever reason it is significantly slowing down the writing of files. I've got files that are ~0.5-1 GB that I'm exporting using your library (which - other than this hiccup - works beautifully btw) and it is suuuuuuuuuuuuper slow. Here is a super-accurate benchmark for you:
That's like 830x slower!!! I understand like 2x... or 4x, but 800!?!?!? The relevant code snippet is as follows:
and the file I'm writing using TinyPLY ends up being ~312MB while the
test for speedz
section outputs a 512MB file. How can we speed this up?? Let me know if there is anything in particular you want me to try.Thanks in advance!