Doodle3D / WiFi-Box

The Doodle3D WiFi-Box for wireless 3D-printing
GNU General Public License v3.0
3 stars 1 forks source link

Test sending big files (>4MB) to Wanhao #4

Closed peteruithoven closed 8 years ago

peteruithoven commented 8 years ago

Perform the same test as #3 but with a Wanhao printer.

olijf commented 8 years ago

During the heating process a lot of gcode is sent at once. Due to this the buffer size drops to 1580 just before printing starts. Once the printer has started printing the gcode buffer goes up and a lot of space is freed.

olijf commented 8 years ago

Print 1 0.10.10-c

The Wanhao stopped extruding with Heater ERROR 4# on the display. I stopped the print, but the wanhao printer hangs during the stopping procedure Included are the logs. Firmware version: latest beta 0.10.10-c

Will test again to see if the Wanhao works again. Wanhao log.zip

olijf commented 8 years ago

print 2 0.10.10-c

On the second occasion the print was succesful. Only once it dropped to 1280 again during preheating.

Wanhao Print 2.zip

olijf commented 8 years ago

print 3 using tablet 0.10.10-c

Wanhao tablet print 1.zip

The Wanhao print from my tablet stopped suddenly. I used ADB to remotely log the webconsole using the chrome://inspect page. Investigating the ram usage does not yield much except 5 times it went below the 2000KB threshold. I am currently testing if the ultimaker does not hang.

peteruithoven commented 8 years ago

With release 0.10.10-d and 0.10.10-e we should try this again. Probably 3 times to be sure. Since https://github.com/Doodle3D/print3d/issues/44 seems like a issue we can't fix in short term, let's exclude it by using a USB hub for these tests.

olijf commented 8 years ago

0.10.10-e Nexus 9 print 1

Print Successful 0.10.10-e test 1 nexus 9 wanhao.zip

olijf commented 8 years ago

0.10.10-e Nexus 9 print 2

Stopped the print because filament was stuck After stopping the print I got a lot of disconnect errors. 0.10.10-e test 2 nexus 9 wanhao.zip

olijf commented 8 years ago

0.10.10-e Nexus 9 print 3

Got a lot of AJAX disconnect errors during preheating. Did not send the doodle correctly to the printer. Printing stopped once the print buffer was empty. 0.10.10-e test 3 nexus 9 wanhao.zip

olijf commented 8 years ago

0.10.10-e Nexus 9 print 4

Same. doodle did not print. 0.10.10-e test 4 nexus 9 wanhao.zip

olijf commented 8 years ago

0.10.10-e Nexus 9 print 5

After a full reset of the wifibox (including firstboot) I performed the same test but this time the print was also unsuccessful. 0.10.10-e test 5 nexus 9 wanhao.zip

Also I checked the running processes of the other tests. It seems that uhttpd is forked off a lot of times but never gets shut down correctly. This is something I also discovered during my stresstesting of the wifibox. I am still investigating why this happens. Increasing the max_requests in the uhttpd config seems to fix this for now.

woutgg commented 8 years ago

In all four failed tests, printer/print requests start failing sooner or later (as early as the 18th chunk up until the 70th). Sometimes one or two more trickle through but no more. When this happens, status/info requests also start failing almost all the time (~95%?). Only in the [second failed test](), other errors than just AJAX failures were logged (mainly net::ERR_CONNECTION_REFUSED).

Except in the first failed test, the last chunk to arrive in tact at the server is also the last one for which the client got an 'ok' back, contrary to what was observed in Doodle3D/doodle3d-client#304.

Even though the client keeps sending both print and status requests, wifibox.log only logs receiving status requests after things start failing. Why could this be? Or am I missing something? Also, the status requests are logged in groups of 4 within one second, then nothing for 10-30 seconds, and this pattern repeats.

Miscellaneous remarks:

So what could be the matter here? Something blocking the uhttpd/Lua process?

woutgg commented 8 years ago

One thing that might come in handy are timestamps in the web console log, so we could inspect intervals between messages as well as map those logs onto the firmware/print3d logs to connect what is happening when.

peteruithoven commented 8 years ago

In all four failed tests, printer/print requests start failing sooner or later (as early as the 18th chunk up until the 70th).

The Wanhao driver does take up more resources, because of the translation process is there any indicator this might be the cause? Did you check the memory usage over time log? (wanhao.log).

Even though the client keeps sending both print and status requests, wifibox.log only logs receiving status requests after things start failing. Why could this be? Or am I missing something?

How did you see this? I see info/status requests from even before seeing /printer/print requests in the rotated logs.

Also, the status requests are logged in groups of 4 within one second, then nothing for 10-30 seconds, and this pattern repeats.

Seems like the requests timeout. If I understood Olaf it could be that one request takes up so much all that requests that are waiting for that request to finish all timeout.

@olijf Could you explain the multiple ip's? If there where multiple devices listening this would have been much of a stress test than I intended it to be. Could you also check if it's possible to save the console logs with timestamp, I understand there are settings available: http://stackoverflow.com/questions/12008120/console-log-timestamps-in-chrome https://developers.google.com/web/tools/chrome-devtools/debug/console/console-ui

olijf commented 8 years ago

I was only actively using my tablet. sometimes I checked the info/status from my pc. but I do not think that this would be much of an impact. .. I presume someone else had the connect.doodle3d.com page open in his browser (at least that explains for 1 more)

peteruithoven commented 8 years ago

I think a important question is whether we made it perform worse with the developments in the develop branch. Therefore I'd like to do 2 tests with less "stress".

@Olaf, could you do 2 more tests?

woutgg commented 8 years ago

The Wanhao driver does take up more resources, because of the translation process is there any indicator this might be the cause? Did you check the memory usage over time log? (wanhao.log).

Do you mean process in the OS sense? The GPX code is compiled into the print server and it only converts as much as it needs each time (here), so the translation should not take up any noticeable extra resources.

Overview of the memory logs:

Could we be dealing with a memory leak, is some other process taking up memory or is this normal behaviour? Even so, according to the syslog the OOM killer never triggered.

How did you see this? I see info/status requests from even before seeing /printer/print requests in the rotated logs.

Sorry my sentence was ambiguous, I meant that it logs both types of requests up until the point where things start failing. After that moment it does not log printer/print requests anymore but still logs info/status requests - even though the client keeps sending both.

Seems like the requests timeout. If I understood Olaf it could be that one request takes up so much all that requests that are waiting for that request to finish all timeout.

Indeed. iirc this also happened during the beginning of the project.

olijf commented 8 years ago

Print from PC 0.10.10-e

Here are the results of another test unsure if print was succesful. I got a lot AJAX errors.

I can do another test on Monday with the chrome time stamps enabled. 0.10.10-e print 1 pc.zip

woutgg commented 8 years ago

Trying to reproduce the AJAX timeouts, I ran several large prints on the wanhao. All of them cancelled after some time since the previously observed issues all occured quite soon - they all ran until the buffer had been full for at least about a minute. The printer was located in the workshop, the wifibox the same one as used in the failed tests from previously, with a newly installed 0.10.10-e image. First tests were on my computer, about 3 or 4 times and no errors occurred. Attempting the same again, this time from the galaxy tab tablet, no timeouts occurred either.

olijf commented 8 years ago

Very interesting, I have 1 plausible theory: On my tablet after 1 successful print the timeout issues arose. It is possible that that first print happened on a just booted WiFi-box. maybe this happens because I did not reboot the wifibox in between? (im unsure if I did this)

On another note: during the daytime the Wifi network is quite busy (lots of devices connected, lots of people etc) so maybe that can influence the wireless signals? It seems the TP link MR3020 should be able to have a 150Mbps network if I run iwinfo I often get speeds far below that (in client mode) I have seen speed of 5.5Mbps and 60~70Mbps at max.

woutgg commented 8 years ago

If it had to do with (not) rebooting, something must have changed because I did not reboot the wifibox in between prints. Could there have been any difference in the files/OS on the box? I mean, that the issue was accidentally 'fixed' by me installing a fresh image?

In case you/we are going to do more tests, it might indeed be interesting to also track the speed & quality iwinfo reports and see if there is any correlation.

peteruithoven commented 8 years ago

We've released 0.10.10, so I'm closing this test.