Numerically intense algorithm runs slower in dart than javascript

DartBot commented 11 years ago

This issue was originally filed by @unicomp21

I've ported it to dart, and now it runs ~10x slower. The original javascript implementation is here

https://github.com/unicomp21/perlin-noise.js.git

while the dart port is here

https://github.com/unicomp21/RadTextures.git

From an algorithm perspective they should be identical. Don't know how I handle this one with a stop-watch timer. An instruction level profiler would be realy nice.

I'll take ya up on that help offer now ;) If this basis function (simplex perlin noise) can be made to run fast, there are a slew of other cool texturing functions that run on top of it. This is useful because large collections of mesh textures can be generated on the client rather than downloaded, cutting down on bandwidth by many orders of magnitude.

What steps will reproduce the problem?

see the repos mentioned above 2. 3.

What is the expected output? something fast What do you see instead? something slooooow

What version of the product are you using? latest On what operating system? linux, windows

Please provide any additional information below.

mraleph commented 11 years ago

We need to inline floor, it occupies the top spot on the profile:

9.54% dart libm-2.15.so [.] floor 9.02% dart dart [.] dart::BootstrapNatives::DN_Double_floor(_Dart_NativeArguments) 7.72% dart dart [.] dart::VMHandles::AllocateHandle(dart::Isolate) 7.54% dart libpthread-2.15.so [.] pthread_getspecific 5.21% dart dart [.] dart::HandleScope::~HandleScope() 4.63% dart dart [.] dart::Double::New(double, dart::Heap::Space) 4.07% dart dart [.] dart::StackZone::StackZone(dart::BaseIsolate) 3.71% dart dart [.] dart::Heap::AllocateNew(int) 3.44% dart dart [.] dart::StackZone::~StackZone() 3.27% dart dart [.] dart::Object::SetRaw(dart::RawObject) 3.15% dart dart [.] dart::HandleScope::HandleScope(dart::BaseIsolate) 3.11% dart dart [.] dart::Object::Allocate(int, int, dart::Heap::Space) 3.09% dart dart [.] __i686.get_pc_thunk.bx 2.98% dart dart [.] dart::Isolate::Current() 2.53% dart dart [.] dart::Handles<2, 64, 4>::DeleteAll() 1.84% dart dart [.] dart::NativeArguments::SetReturnUnsafe(dart::RawObject) const 1.49% dart perf-18947.map [.] _stub_CallNativeCFunction 1.17% dart dart [.] dart::Double::IsDouble() const 0.76% dart perf-18947.map [.] dart:core__Double@0x36924d72_floor

I will look into it.

Additional slow down might be coming from issue #5661

Set owner to @mraleph. Added Area-VM, Accepted labels.

mraleph commented 11 years ago

I enabled inlining of double.floor/double.ceil when SSE4.1 is available in r17278.

This significantly increases performance of the computational core on my machine: roughly by a factor of 6.

However we still don't inline them if SSE4.1 is not available (while V8 does have fallback paths for SSE2). We should consider adding similar ones.

@unicomp21: what is your CPU?

Added Fixed label.

DartBot commented 11 years ago

This comment was originally written by @unicomp21

X86

DartBot commented 11 years ago

This comment was originally written by @unicomp21

Thanks!

DartBot commented 11 years ago

This comment was originally written by @unicomp21

btw, when will the fix show up in the m2 build?

DartBot commented 11 years ago

This comment was originally written by @unicomp21

Also, are there any plans for the compiler to vectorize to SSEx? Or perhaps adding something like Renderscript to the VM?

DartBot commented 11 years ago

This comment was originally written by @unicomp21

toInt appears to suffer from the same slowness, perhaps it needs to be inlined as well?

DartBot commented 11 years ago

This comment was originally written by @unicomp21

slow toInt sample, see https://github.com/unicomp21/RadTextures.git

mraleph commented 11 years ago

Fast path of the toInt is inlined (convertion to a smi).

However the result of toInt is not necessarily a smi or even mint, it can be big integer (beyond 64 bits).

My suggestion would be to not use toInt at all in any numerically intense code for now.

mraleph commented 11 years ago

Actually on your benchmark toInt is always compiled down to DoubleToSmi instruction (optimistic assumption, if the value will not fit into a smi then code will cause a deopt).

Resulting code seems to be fast. What are you comparing against?

mraleph commented 11 years ago

I see: RidgedMultifractal_Default reveals that Double.pow is not inlined. I have filed an Issue #8002 for that.

DartBot commented 11 years ago

This comment was originally written by @unicomp21

Resulting code seems to be fast. What are you comparing against?

I'm still comparing to javascript, overall performance is still much slower.

dart-lang / sdk

Numerically intense algorithm runs slower in dart than javascript #7971