Very slow loading big STL files

ANTONIOPSD commented 3 years ago

Application Version

4.11

Platform

Windows 10 x64

Printer

Creality CR6-SE

Reproduction steps

I print lithophanes and some of them are +1.5gb big and when I try to lead them in Cura, it just crashes after trying to load thm for more than 5 minutes at 100% cpu, I can load them in Simplify3D in less than 40 seconds with no problems at all and also they get sliced in less than 30 seconds. I also create complex parts for some devices and the take less than 10 seconds to load and slice in Simplify3D but in Cura sometimes they take up to a minuto to load and slice with almost the same settings.

Is there any plans to upgrade the loading process and make it faser like some other slicers like Simplify3D?

STEPS:

Load a +1GB STL file

Actual results

Load a +1gb stl, hangs for some minutes at 100% cpu and crash or take ages to load

Expected results

Load a +1GB file, load fast.

Checklist of files to include

[ ] Log file
[ ] Project file

Additional information & file uploads

I just want to fully move to Cura and stop using other outdated slicers, but the speed difference with the same hardware is way too slow in cura.

Just try to load any huge STL file in Cura and other slicers and you will see how slow cura is.

fvrmr commented 3 years ago

Hi @ANTONIOPSD thank you for your report. A 1 gb stl is really big to load in Cura. So the resolution is to high to slice, you could lower the resolution of your model.

It could also be that your STL's are ASCII instead of binary STL's.

ANTONIOPSD commented 3 years ago

Hi @ANTONIOPSD thank you for your report. A 1 gb stl is really big to load in Cura. So the resolution is to high to slice, you could lower the resolution of your model.

It could also be that your STL's are ASCII instead of binary STL's.

Yeah, but Is there any work in progress to improve the handling of big STL files like Simplify3D does? Sometimes the file needs to be that big because of the needed quality. If other slicers can do it, I'm sure the Cura devs can do it. I would help if I could, but I lack the needed skills tocontribute to the code😅

Ghostkeeper commented 3 years ago

We use several libraries to process the input models loaded into Cura. Cura's loading is going to be as slow as those libraries.

There are four operations in Cura that are linear+ in the size of the model, as far as I can think of:

Loading the STL (with Numpy-STL). It needs to read and parse the file, and there's no way around this. The library could probably be made faster here and there, but we have to assume that the easy wins are already implemented in there. It's actively maintained.
Sending the mesh to the GPU for rendering. We do this through the OpenGL bindings provided by Qt. If you want to see your model, this has to be done. Without going into streaming loaders (not supported by Qt) this can't realistically be improved.
Computing the convex hull for collision checks. This is probably what it would spend the most time on, and could be the main reason why you'd claim that other applications are faster than Cura. We use Scipy (Spatial) to do this. There might be libraries who can do it faster, but it will probably stay an expensive operation. The only real way to fix this is to drop the collision checks or to make them simpler. This operation is also O(n log n).
Sending the model to CuraEngine when the slice starts. This is one point where Cura is at a big disadvantage: It is the only slicer as far as I know where the slicing process is a completely separate application. We decided on that for stability, but it is indeed slower in this case. Fixing this would require a significant re-think of our architecture.

I just tried loading a 1GB ASCII STL file. It took 31.8s to load. Actually not too bad, although I think my file system was cached in RAM there because I had just written the file. And the convex hull of the model is square.

aconz2 commented 2 years ago

I also see very slow load times on an STL with 16 million verts and 5.4 million faces (~260 MB binary format), some timings (taken from the log, line like [JobQueueWorker [2]] UM.FileHandler.ReadFileJob.run [83]: Loading file took 7.0 seconds:

platform	version	time (s)	using numpy-stl?
windows	5.0.0-beta+1	31.6
windows	4.13.1	7.0	✓
linux	5.0.0-beta+1	138.8
linux	4.13.1	40.3

(Hardware note: linux CPU single core benches around 40% faster than on Windows; so I would have hoped for better perf than I see)

For anyone coming across this looking for a workaround, try 3MF format. The same model in 3MF loads in 9.7 seconds on Linux with Cura 5.0

Loading the STL: with numpy-stl, I can load this model in 300ms on my machine, but you can see it is only being used on Windows 4.13.1 (not sure why not in 5.0 though). On Linux it appears there is an open internal issue CURA-7154 that is preventing this from being used

https://github.com/Ultimaker/Uranium/blob/6a23f8d6d00b0286a66018ae995958b81bd180fe/plugins/FileHandlers/STLReader/STLReader.py#L19-L34

Computing the convex hull: my log file shows this only takes 3.3 seconds because it uses an approximate convex hull (nice!).
Sending the model to CuraEngine: Slicing this model (Linux with 5.0) takes 16.9 seconds, 9.6 seconds for preview. This results in 417 layers. I don't have an easy way to identify how much time it spends in transferring to CuraEngine, but my guess is that it is negligible here

Poking around a bit, my suspicion is here:

https://github.com/Ultimaker/Uranium/blob/6a23f8d6d00b0286a66018ae995958b81bd180fe/plugins/FileHandlers/STLReader/STLReader.py#L180-L187

I pulled out the logic for _loadBinary into a test script (see below). I can load the same STL in 3.1 seconds on my machine when there is no time.sleep(0) (the equivalent of Job.yieldThread(). But when you include the time.sleep(0) in every loop iteration, it adds ~50% overhead and gives me 4.2 seconds. And if I were to guess how this behaves in Cura itself, actually having other threads running could increase this overhead by quite a bit.

Is that plausible? Would it be acceptable to only Job.yieldThread every Nth iteration instead of every iteration? Maybe we could do the first thousand iterations to get an approximate speed and then calculate an appropriate yielding frequency to meet whatever responsiveness period you're interested in? Or if you could share more info on CURA-7154 perhaps solving that would be the best solution in this case.

I did not fully look into why 5.0 is 3x slower than 4.13.1 on linux. If it were the Job.yieldThread thing I mentioned, maybe there are just more threads now?

And I also was surprised 5.0 on Windows wasn't using numpy-stl, is that a mistake?

Test script

```python import sys import struct import os import time from typing import cast t1 = time.time() f = open(sys.argv[1], 'rb') f.read(80) # Skip the header num_faces = cast(int, struct.unpack("

smartavionics commented 2 years ago

Hi @aconz2 , intrigued by your post, I did a quick test on my Cura 4 based linux build. I reduced the number of calls to yieldThread() by a factor of 10 in the binary STL loader and was surprised by the results. The total time to load the 86Mbyte file actually increased from around 20 seconds to 30! However the motion of the bouncing blue rectangle at the bottom of the screen became much smoother with the reduced calls to yieldThread(). Here's what I added...

        for idx in range(0, num_faces):
            data = struct.unpack(b"<ffffffffffffH", f.read(50))
            mesh_builder.addFaceByPoints(
                data[3], data[5], -data[4],
                data[6], data[8], -data[7],
                data[9], data[11], -data[10]
            )
            if (idx + 1) % 10 == 0:
                Job.yieldThread()

I shall play some more with this to try and understand the observed behaviour.

smartavionics commented 2 years ago

So, I made a mistake in my previous test in that when it took around 20 seconds to load, that was actually with no calls to yieldThread() in the loop. Testing with calls to yieldThread() every 100 loops took around 21 seconds to load with the blue rectangle moving quite smoothly, with a call every 1000 loops, the loading time is around 20 seconds but the animation is jerky.

So it appears that calling yieldThread() every time around the loop doesn't actually cause a slowdown?

aconz2 commented 2 years ago

@smartavionics I did some follow up testing on Linux and got:

version	yielding?	time (s)
4.13.1	yes	31.1
4.13.1	no	15.8
5.0b1	yes	105.3
5.0b1	no	16.2

I unpacked each release AppImage with the --appimage-extract flag and then edited the source of STLReader.py directly and inspected the log output.

It does seem mysterious and I wouldn't be surprised if Thread.yield is a red herring in the end.

smartavionics commented 2 years ago

It does seem mysterious and I wouldn't be surprised if Thread.yield is a red herring in the end.

That yieldThread() call just invokes time.sleep(0) which gives other threads a chance to run.

nallath commented 2 years ago

We have found an issue with the 5.0 release; We accidentally forgot to include numpy-stl. This meant that Cura used the (slower) fallback STL loading.

jellespijker commented 2 years ago

@nallath Numpy-STL was part of the requirements, see https://github.com/Ultimaker/cura-build-environment/blob/main/projects/requirements.txt

It could be that pyinstaller didn't collect it.

Ultimaker / Cura