Smoothieware / Smoothieware

Modular, opensource, high performance G-code interpreter and CNC controller written in Object-Oriented C++
http://smoothieware.org/
GNU General Public License v3.0
1.3k stars 995 forks source link

G2 Circles non-concentric on edge build #972

Closed Speedy64i closed 8 years ago

Speedy64i commented 8 years ago

While running the latest edge build, Smoothie produces non concentric circles with the attached GCode (please see attached picture for example). The problem is 100% repeatable on edge (Seemingly started after the new acceleration code was added) and does not occur after re-flashing and running on the master build.

It seems to only affect G2 commands (maybe G3, haven't tried them). I've cut with G1 commands and they all worked fine.

Referring to the picture: The small through-hole is cut in a seemingly correct position but the larger shoulder cut around it should be concentric to the through-hole in the center of the image. Instead, the larger diameter circle is offset up and left (in relation to the origin of the machine) from the through-hole (Both are drilled with the attached G-Code). The picture is rotated 90 degrees clockwise, the origin of the machine as referenced in the picture is upper left.

Problem occurs on: Build version: edge-dcb2d90, Build date: Jul 9 2016 04:05:31, MCU: LPC1769, System Clock: 120MHz Smoothieboard 5X v1.0b Host: bCNC on Raspberry Pi 3 running latest Raspbian There is no config override Please see attached for my config file The Zip file is the G-Code used.

config.txt Laser Holder Slotted Holes2.zip non-concentric

wolfmanjm commented 8 years ago

can you try replacing mm_per_arc_segment with mm_max_arc_error 0.01

Let us know, that code was recently updated but you failed to update your config as specified in the upgrade guide.

wolfmanjm commented 8 years ago

Also please follow the guidelines for submitting an issue... we need the exact version you are running not a "latest edge" as that is somewhat meaningless. https://github.com/Smoothieware/Smoothieware/blob/edge/HowToFileIssues.md

Thank you

Speedy64i commented 8 years ago

wolfmanjm, thanks for the reply. @version didn't return any version information from bCNC's terminal. give me a few and i'll try to get pronterface hooked up to it. My apologies for missing the config change, i'll add it in and give it a shot. I don't think it's mentioned in the upgrade notes https://github.com/Smoothieware/Smoothieware/blob/edge/upgrade-notes.md

Zaaphod commented 8 years ago

Try version without the @ I backplotted your program and it looks good. I will run your program on my router tomorrow to see if I can duplicate your results. and will also test concentric circles specifically to see if I can find any issue.

Speedy64i commented 8 years ago

Zaaphod, thanks, that worked great for getting the version info.

I just tried a new cut after re-flashing and changing the config file, the results are the same non-concentric holes

Here's the current version I just tried: Build version: edge-dcb2d90, Build date: Jul 9 2016 04:05:31, MCU: LPC1769, System Clock: 120MHz

I changed the config file to read the following after updating it per the latest config reference: mm_per_arc_segment 0.0 mm_max_arc_error 0.01

wolfmanjm, did you want me to comment out the mm_per_arc_segment completely or does setting it to zero like in the config example accomplish the same effect?

wolfmanjm commented 8 years ago

it accomplishes the same thing. @Zaaphod contributed the G2 code so we will see if he can find the issue. Thanks

wolfmanjm commented 8 years ago

The odd thing is that the master build has the same G2 code in it, so if it works on master and not edge then I suspect you are skipping steps or something due to the change in the motion control. We will see if this can be reproduced.

wolfmanjm commented 8 years ago

BTW you need grbl_mode enabled for bCNC to work. It may simply not be sending gcode properly without that. I presume you also have set the smoothieboard selection in bCNC and not grbl.

You config is very old and should be replaced with the current config.

wolfmanjm commented 8 years ago

is this the same gcode? or a test that exhibits the same issue? because it says the bit size is 6.35mm but the inner hole seems smaller than that and running it on camotics you get one hole.

wolfmanjm commented 8 years ago

I just merged a fix that may affect this issue, you will have to flash the latest edge to try it. The issue is one where tiny moves that do not generate any actual steps would effectively look like missed steps (basically it was a typo). As you have very low steps/mm and arcs generate tiny segments this may have caused an issue.

Zaaphod commented 8 years ago

@wolfmanjm that could be the issue. the smallest arcs are only a radius of 0.033mm with 64 steps per mm when the arc is divided each segment would be less than one step.. furthermore upon closer inspection of the Gcode file, there is actually an even smaller arc in the very center, it's only a radius of 0.015mm. I had to zoom way in to even see it. There are a total of 49 small arcs that would produce segments smaller than one step, and if each one lost a step, the total error would be 0.765mm... a quite noticeable amount.

I will test this to try to duplicate the error on my CNC router, it has an even lower steps per mm, (only 41) so if this is what's happening I will see it. If I can duplicate the problem then I can also try your fix to hopefully confirm that it fixes the problem.

Do we need to add some logic to the circle routine that rejects circles smaller than one step? if the machine can't step, then it's pointless to bother dividing the arcs to even smaller segments in the first place... just generate a move to the center of the arc and it's done.

Your fix would still be necessary however because when doing things like engraving true type fonts, you end up with a whole pile of stupid tiny line segments way smaller than a step.... This reminds me of another issue with engraving true type fonts... you also end up with tiny arc segments where the beginning is closer to the end than one step so it thinks you need a full circle, but a full circle is defiantly what you do not want... the full circles normally have a large enough radius to ruin your engraving. I have a command in my old gcode processor called 'disable full circles' just for this kind of engraving situation.. it basically makes it ignore any arc segment where the beginning and end is closer together than one step. There is now way for a gcode program to tell if you meant to have a full circle, or not because when the post processor rounds to 4 decimal places it's actually the exact command for a full circle. I would insert "disable full circles" before the engraving, then "enable full circles" after it I can test specifically for this issue and determine if we need a fix or not for that as well. If we do need a way to disable and enable full circles, is there a G or M code we can use for this that will process nicely with smoothie?

Zaaphod commented 8 years ago

I have confirmed this issue on my CNC router. 20160709_081102

@wolfmanjm I get the same results your recent update.

Next step I will run another program with even more circles, but eliminate circles smaller than one step to verify the problem is related to circles smaller than one step

Speedy64i commented 8 years ago

@wolfmanjm, I gave it another shot this morning with edge-0070349, Build date: Jul 9 2016 00:10:52. I had the same non-concentric results as with the previous tests.

I ran examdiff pro on the new config and my config, the only change I could find that wasn't using the same formatting as the new config files was the mm_max_arc_error line you mentioned earlier. I've removed a lot of stuff I don't use in the config and the comments I usually don't update when I update the variables with examdiff pro. If @Zaaphod can't recreate my issue, i'll retype the pertinent changes into a brand new config file and see if that fixes it.

I had originally tried grbl_mode when I first started working with this machine. I ended up with a lot of crashes in bCNC and funky operations so I just turned it back off, the only thing I noted after disabling it was I no longer could home the machine without using G-codes, so now I just have a bunch of shortcuts to do the missing operations with standard G28 and workspace commands. If you'd like I can give it another shot again with it enabled today, but it has been working great up till a few days ago. with grbl_mode enabled, should I have bCNC set to grbl or Smoothie? you're correct in that I have it currently set to smoothie.

The gcode file I attached is the exact same one I used to fruitlessly drill holes in my now swiss-cheesed 80mm laser mounts :), no changes.

you are correct, the bit is 6.35mm. The code is supposed to drill an 8mm diameter through hole and then a 14mm stepped hole around it so a bolt head can be placed beneath the surface. The G2 circle is the path of the bit and the hole cut should end up being 6.35mm larger in diameter than the G2 circle calls for.

Speedy64i commented 8 years ago

I have no clue if they are related, but another difference I've noted between the master and edge builds has to do with machine X Y position after probing. I use the z probe to set my work surface. on Master when I do a Z probe, the X Y axis numbers would stay where I initially put them, so if they were X 200, Y 500, bCNC would report that position (which I think it's getting from smoothie). on the edge build, the display on bCNC reports that it drifted off a few hundredths of a mm as soon as I start a probe operation. and will read something like X 199.9925, Y 499.9853.

Zaaphod commented 8 years ago

For those following this issue. Here are screen shots of the backplotted program program When zooming in you can see the tiny circle in the center that really does not need to be there but was generated by an automatic pocketing routine (highlighted in green) program2

Zaaphod commented 8 years ago

@Speedy64i I believe the difference you are seeing in the XY coordinates is that now it is showing the ACTUAL position (to the nearest stepper motor step) as opposed to the requested position. Stepper motors can either step or not step, there is nothing in the middle. each step in X or Y on your machine moves it approx. 0.01559 mm, so all actual coordinates will be in multiples of that, as the machine cannot be resting at any other coordinates. Using the actual position caused by the steps is more accurate than using the number you requested.

Speedy64i commented 8 years ago

@Zaaphod thanks! makes sense!

Zaaphod commented 8 years ago

I've done a few more tests, the issue persists without the tiny arcs. Exactly the same result with them removed. I did another test with concentric squares and also got the same result. I increased the number of z movements in the squares test, and as expected it was off further. I also did the inside square of the lower hole, then the inside of the top hole, then the outside of the top hole and finally the outside of the lower hole, so the lower pattern shows the greatest error

20160709_101218

Zaaphod commented 8 years ago

I am getting the same results with Master branch.

@Speedy64i do you happen to know what version it was that you did not have this problem with this program? Are you able to resolve the problem with Master branch on your machine?

This looks like a cumulative rounding error to me problem to me, but other things could cause it as well. I can do a test with a program with even number of mm and temporarily set my machine to an even number of steps per mm to see if rounding has anything to do with it.. if nothing else it will rule out the possibility

Zaaphod commented 8 years ago

More details on this. Apparently there IS a cumulative rounding error. I left my config alone, so I still have 41.58464566929134 steps per mm for X and Y in my config, however I ran the attached code: CONCENTRIC SQUARES whole number.txt I manually just searched and replaced all the decimal parts of the numbers with nothing. Here's the file with decimal points: CONCENTRIC SQUARES.txt You can see from this photo. Concentric squares with whole numbers only look fine, with decimals they are drifting off. This seems to indicate a cumulative rounding error... however it is odd that I was able to correct the problem with my very non-whole number of steps per mm. I was expecting to need to change that to a whole number as well, but I did not. 20160709_112309

I do not know where in the code the rounding to the nearest step happens. I know from other motion control programs I have written that the previous error needs to be saved and compensated for during the next rounding, this generates a new rounding error that is then saved. Generally keeping track of the position in absolute coordinates as smoothie does normally ends up self correcting all rounding errors so they cannot accumulate, so I'm not sure where this problem could be.

I will try to think up some other tests to try to narrow down the issue further.

Zaaphod commented 8 years ago

1 decimal place is enough to cause the problem. Here's the file I used to test 1 decimal place.
Concentric Squares 1 Decimal Places.txt

I did only G0 X20 then G92 X0 between runs... (photo is sideways)

20160709_142840

I can try some variations of this program to try to find out what's going on. It's clear that it's shifting only in the -Y direction (photo is sideways) I'll try turning the parts and see if I can make the problem happen on X. The original issue was affecting both X and Y... I'll try to find some correlation between the Gcodes and how it gets to be off.

Zaaphod commented 8 years ago

I've been thinking about this... This is not rounding error... Somewhere, I don't know where, but in a place where Gcodes are processed and before they are converted to steps, there is a variable accidentally declared or treated as an integer but it should be floating point.. I made that mistake recently by using pow instead of powf. I suspect something similar is lurking somewhere. Then at some other point the machine position is correctly adjusted with floating point, but this accidental integer treatment for the actual move is causing a difference in position between the physical machine motion and the variable holding the position. The only thing that makes sense given the data is that somewhere something is integer but should be floating, and that's why only a program with no decimals will not display this issue

I can test this theory by writing a program in inches, and see if there is a difference between 0 decimal places and 1 decimal place. Inch moves are constantly multiplied by 25.4 so this will at least tell us if the issue is before or after the conversion

wolfmanjm commented 8 years ago

The only thing I can think of is that 41.58464566929134 is beyond a float resolution. AFAIK I think it is good to 8DP at that magnitude

wolfmanjm commented 8 years ago

I think maybe the original issue and this one with the squares maybe two different issues.

Speedy64i commented 8 years ago

@Zaaphod , I just checked the master version I was using without the issue, it was upstream-master-8d008bb, Build date: Jul 2 2016 14:01:14. I ran it again to be sure and I don't get the non-concentric hole issue like I do with edge. I'm not sure what i'm doing different to make master work, I use the same config file for both (same one as attached to this thread). Please let me know if you need me to test anything.

I'm currently using X, Y - 64.16325 (this is rounded from 64.16325265...) Z - 320

Is your Z number a whole number? is it playing into the error?

Thanks a ton for your guys hard work!

Zaaphod commented 8 years ago

@wolfmanjm Perhaps you are correct and there are two issues involved. I still haven't established any kind of pattern, other than I never see any issue if I use integer coordinates in my Gcodes. however that could be misleading as well. I do not believe it's backlash or a resolution problem because when it's right, it's dead on.. and I can make it get worse by doing more of the pattern consecutively.

I'll try to eliminate some possibilities.. too many variables here. I'll start by setting my steps_per_mm to just 40, and see what results that yields. I didn't really think it would use all those decimal places, but I did figure the variable it was going into would take care of not being able to store decimals past it's precision, however it is possible that having decimals out to the edge of precision in one variable could cause inconsistencies in other floating point variables if they are multiplied or divided. at some point there needs to be position multiplied by steps_per_mm.. perhaps having a combination of too many decimals is the problem.. one variable or another can have them but not both? I will attempt to prove or disprove this theory.

Zaaphod commented 8 years ago

Info: Exactly the same results if I set steps_per_mm = 40 I verified steps_per_mm with M92. This could still be floating point weirdness somehow.. I'm trying to think up a test that will narrow it down.

Speedy64i commented 8 years ago

I tried a few more tests and stared at the machine for a while... I noticed a couple things. Just to try it, I changed the g-code to never move the Z axis but still perform every circle movement, predictably, the problem is still there. I noticed using the original gcode, the entry and exit path from the bottom circle is limited to the X axis, and the top circle is limited to the Y axis. If I only run circles with entry points on the 3 o'clock position, only the X axis is moved towards negative, Y axis is unaffected. vice versa for the Y axis (please see picture below). It seems the error is introduced somewhere in the entry/exit to the circle.
2 important notes:

  1. The Start/stop location (as in the axis the entry/exit move is on) affects the error direction
  2. the circle size affects the error amount.

edit: (I forgot to note the bottom hole's small diameter circle does have a different clock position than the larger one) Since the larger diameter holes have more error, I am referring mostly to the larger holes, although the issue persists with holes of any size, the size affects the amount of error.

Hope this helps xy error

Speedy64i commented 8 years ago

More info, I've tried to simplify things, by only running the following code snippet from the last few lines of the code posted on the top of this thread, the X axis will walk towards negative by .02m to .03mm every time it is run., Y axis seems unaffected.

G1 F300.0 X54.3641 Y49.1924 G2 F800.0 X52.0875 Y49.6875 I-0.3641 J3.8076 G2 Y56.3125 I1.9125 J3.3125 G2 X57.825 Y53.0 I1.9125 J-3.3125 G2 X54.3641 Y49.1924 I-3.825 J0.0

gauge

Speedy64i commented 8 years ago

I don't know smoothie code (at all), but could this be the acceleration code no longer correctly adding up the linear (or other) movement? I just tried the following Gcode and I got the same drift towards -X with straight lines as I did with the circle test. I confirmed my dial gauge wasn't moving by telling the machine to go back to absolute home (which I store in G59) on the machine. It hit the end stop, which it shouldn't be able to do, so I know it drifted towards -X.

G1 F800.0 X10 Y0 G1 F800.0 X0 Y0

If it were a straight calculation issue, shouldn't it equal out between positive and negative accelerations? maybe it only takes place on acceleration changes that get some modifier, like junction deviation does? I'm uneducated and guessing, just throwing out stuff.

Zaaphod commented 8 years ago

@Speedy64i could you try running my program I posted above... CONCENTRIC SQUARES whole number.txt on your machine? for some reason I get no drift on that program, but I do get it with one decimal place. I'm not sure if there are multiple issues going on here. It would be great to see if my results are duplicated on your machine. Please note that my concentric squares test is quite a bit taller than your original because I was running more times to amplify the drift effect. I will try duplicating your tests on my machine as well.

to confirm you are experiencing drift with G1 F800.0 X10 Y0 G1 F800.0 X0 Y0
?

Speedy64i commented 8 years ago

Narrowed it down some more, I noticed if I manually move the machine back and forth 10 mm in G91 relative mode, the issue doesn't happen. If I run the following code however: The machine will drift -X: G91 G1 X10 G1 X-10 G90

The machine will drift +X: G91 G1 X-10 G1 X+10 G90

Could this be something with the planner?

Speedy64i commented 8 years ago

@Zaaphod I can try running it, but give what I just posted above a shot, Looks like if the machine has to plan equal accel/decal in +x/-x you'll never notice the issue, but if there is an asymmetrical start/stop, you will see it. So if I go left right left in my code, no total error, but if I go just left right, error and right left error. so what we're seeing in the longer runs would be cumulative error based on which directions concurrent moves are completed on whatever axis called for. I haven't tested what direction the error will be if I do a right angle, i'm about to test that next.

Speedy64i commented 8 years ago

@Zaaphod yep, copy and paste one of the two codes and run it about 100 times, you'll see 3mm of drift.

Option 1 G1 F800.0 X10 Y0 G1 F800.0 X0 Y0

Option 2 G91 G1 X10 G1 X-10 G90

If you jog it back and forth using the same gcode (where the buffer is emptied before running the next command, there will be no drift)

Edit: forgot to mention, don't copy and paste it 100 times in the same gcode file, it will just end up with only .03mm of error, you have to run it 100 times to see the ~3mm error.

Zaaphod commented 8 years ago

so if I do:

G1 X10 G1 X0 M400

over and over in a loop I should get no drift? M400 waits for the buffer to be empty before returning, so no gcodes would be accepted until the buffer is empty

Speedy64i commented 8 years ago

I just tried your whole number code, looks to be pretty concentric on edge, I can confirm that I get the same results as you did.

I have never used M400, I'll re-try the dial gauge test with that code pasted 100 times. But yes, if that has the same effect as running it 100 times seperately, you will see 3 mm of error in the -X direction. if you run it 30 times, 1mm of error, etc. it's additive.

Speedy64i commented 8 years ago

@Zaaphod Sorry I read too fast, that code you posted will get drift use this instead... G1 X10 M400 G1 X0 M400

Zaaphod commented 8 years ago

Speedy64i Thanks for testing it. I was suspecting some kind of floating point issue.. but I'm not sure.. could you also try Concentric Squares 1 Decimal Places.txt to verify it does have the issue? I don't yet understand why having a decimal makes a difference. Possibly it's nothing to do with the decimals and I just got lucky that all moves are balanced without them but unbalanced with them or something like that...

I'm hooking up my logic analyzer now and I can get actual number of steps produced with this.

Speedy64i commented 8 years ago

Sure i'll test the other one in a few minutes. I just tried the following code: This produces no drift over 50 cycles dial gauge read 0.00mm: M400 no-drift 50x.txt G1 X10 M400 G1 X0 M400

This produces -1.51mm of drift on the X axis after 50 cycles and -3.08mm after running the 50 cycle program twice: M400 drift 50x.txt G1 X10 G1 X0 M400

Zaaphod commented 8 years ago

since you have it set up..

what are the results of

G1 X10 G1 X0

repeated 50 times?

also..

what are the results of G1 X20 G1 X0 M400

repeated 50 times?

Thanks for helping to test this

Speedy64i commented 8 years ago

I accidentally ran your first test already, I think I remember it producing -0.03mm X axis error when programmed for 50 cycles. I'll try them both again here shortly.

My machine should be nominally producing 0.015585245mm/step of movement. So it looks to be off by 2 steps after the first test. If I take the 1.51mm and divide it by my mm/step value I get 96.887 steps missed across the M400 drift 50x.txt test; about 2 steps per cycle.

also I can again verify your square hole results. square test results

Zaaphod commented 8 years ago

Thank you for verifying this. It helps rule out something with my machine in particular. I'm not sure why leaving of f the decimals makes it work.. it could be that I accidentally balanced it, or it could be some floating point math weirdness...

At least we are dealing with a duplicatable issue here. the worst problems are intermittent and difficult to reproduce.

I have data from my logic analyzer.. I'm working on a spreadsheet to count the steps...

Speedy64i commented 8 years ago

This results in -0.03mm of offset G1 X10 G1 X0 no M400 10mm 50x.txt

This results in -1.51mm of offset G1 X20 G1 X0 M400 M400 20mm drift 50x.txt

I have a hunch it's undershooting accelerations and decelerations by one step each... I'm going to go test it.

wolfmanjm commented 8 years ago

This certainly does not reproduce on any of my machines. It would also have been noticed by others. Acceleration does not affect the number of steps generated, it just affects the speed they are issued at.

I have a hunch... are you using external drivers? If so can you invert your step pulse? and inverted step pulse would explain losing a step every segment.

Speedy64i commented 8 years ago

My hunch was wrong as usual. Yep, i'm using external drivers but they are wired for step on positive pulse though 3.3v to 5v logic converters. are the internal smoothie drivers negative pulse?

wolfmanjm commented 8 years ago

yea invert your step pulses see what happens. inverted step pulse would explain what you are seeing.

Speedy64i commented 8 years ago

I just tried a few of the tests on my 3d printer using the internal stepper drivers, and like you I can't reproduce the error I'm seeing on my CNC machine. I'll go invert my CNC and report back, thanks a ton for your help.

Zaaphod commented 8 years ago

I don't see how that could have ever worked on the master branch though.. and my tests show the same results on both master and edge from 7/9

I cannot find any issue with my logic analyzer doing zig-zag tests. xy test

3 Edge 7-9 - original config.xlsx xl top xl bot

You can see I get exactly 4800 X positive steps and 4800 X negative Steps These were using M400 stops each zigzag.. and also 4800 Y positive steps and 4800 Y negative steps , these had no M400 until the end of program

X test.txt

I'm pretty sure I did my spreadsheet correctly.. let me see if I can adjust the formula in my spreadsheet so show an inverted step pulse.

@wolfmanjm Please confirm .. Does Smoothie expect the step to happen on the falling edge?

wolfmanjm commented 8 years ago

@Zaaphod No it must step on rising edge of step pulse. @Speedy64i common anode drivers with opto isolators usually require an inverted step pulse from smoothie.

wolfmanjm commented 8 years ago

It must step on the rising edge of the step pulse from smoothie not the falling edge.