Closed Bergerac56 closed 5 years ago
There were some changes made to the Planner:: to split the initial move given to it to make combining subsequent moves more likely. It may be those changes causing trouble.
Reading #8595 will provide more detail.
Thanks Roxy, I will follow #8595 to see if a solution is found
Another thing that came up in that discussion is to check your z_jerk setting (and etc.). I don't think the problem you're describing would be caused by that, but it's worth checking your jerk settings if you have 'stuttering' movement. Having any of them inadvertently set to 0 (or too small?) will force the planner to stop between certain moves, to satisfy acceleration constraints.
@bjarchi - I tried your suggestion (who knows...)
Unfortunately, it does not solve the problem. The last bugfix2 (466f276 ) does not help too.
@Bergerac56 You might want to try pulling the latest commits from bugfix-2.0.x to see how things behave now. There were some issues related to changes in the planner over the past few days that seem to have been ironed out.
@bjarchi - Sorry to come late. I was not in town.
No, the last bugfix (69d49a2) does not solve the issue yet ;). Still exactly the same.
Interesting. What kind of machine are you using - straight cartesian, core or delta?
I haven't seen this behavior - and I've been tracking bugfix-1.1.x pretty closely - but I'm still on an 8-bit controller and still on 1.1.x so it might be 32-bit or 2.0.x specific.
It is a cartesian using a RE-ARM and not an AVR 8 bits.
You are right, the problem do not occur with the 8 bit processor. I suppose it is related to the 32 bit version.
Unfortunately, the last bugfix (2.0 - c555a21) does still not solve the "jumps" of the Z axis...
@Roxy-3D - Hello Roxy As I think you are one of the most skilled person on UBL, I would like to come back on this strange issue where Z is jumping in sort of "discrete" moves in place of smooth continuous moves along Y axis. (X works OK)
I recompiled one of the marlin version with the "Z problem" with another leveling option (bilinear in this case). After a leveling, there is no problem of jump anymore. It seems that it happens only with UBL.
The first time I noticed this was when @hg42 made this change (constexpr uint8_t NUM_DIGITAL_PINS = COUNT(pin_map);). As already said, I think that it is not the cause but a revealer of another issue.
Any suggestion to solve this? (I love UBL ;) )
Thanks a lot
Olivier
@Bergerac56 Unfortunately, I don't have a machine available to me right now.
But just to make sure I understand things... You are seeing this bad behavior on a Re-ARM board using the current bugfix-v2.0.0 code base, right?
There is a lot of debug code that can be turned on. And probably that is what it is going to take. Does the jump happen all the time? Or is there a particular sequence of moves that sets it up for the bad behavior?
@Roxy-3D - Yes, configuration is a RE-ARM with a RAMP1.4 quite standard (just EMI protections on thermistor cables. Thermistors on specific ports of the RE-ARM) and a graphic LCD. I was already able to print with a good (quite equivalent to mega2560) quality a lot of times (depending on the bugs of the releases of course ;) ). I use this printer to test and "follow" you in the 32 bits ( RE-ARM) journey. I can easily re-mount a mega in the printer and can switch from versions as I try to keep a track of past releases. So I am able to verify to a certain extend that the issues are soft or hard ones.
For the moment, I can only use the last (in fact the 2 last) version of bugfix 2 by using a host as the LCD does not work with those versions (They work OK when I "downgrade"). The code loaded for the moment is the version aeb5c62 as it seems to be the last without LCD problem. I have also another issue with the 2 last bugfix 2.0: The temperature of the extruder never reaches the defined temp (If it is 200C, it "blocks" at +- 197C and never start to print) As I did not investigate a lot, I did not report this yet. And... yes, I retried with older versions to see if it was an hardware problem... the answer is no ;)
I know that Z is moving smoothly when I compile with linear or bilinear. But not with UBL.
I believed that this behavior was related to the change: (constexpr uint8_t NUM_DIGITAL_PINS = COUNT(pin_map);) made some time ago. I just "reverted" it to test. The issue is still there.
EDIT: I forgot to say that I tried with config files only modified to adapt to my printer. No fancy tricks (Leds, babystepping, etc...). Same behavior.
The first time I noticed this was when @hg42 made this change (constexpr uint8_t NUM_DIGITAL_PINS = COUNT(pin_map);). As already said, I think that it is not the cause but a revealer of another issue.
Are you really sure my patch started the behavior? or could it be a coincidence with other chages at that time? Perhaps you could revert only this change (or set NUM_DIGITAL_PINS to a negative value).
If it's really triggered by NUM_DIGITAL_PINS, you could conclude something from the fact that this change only enables digital pins at certain locations in the code. Additionally, code may be executed because of these now existent pins, that otherwise wouldn't be used, which may have side effects. But if this code doesn't contain severe errors, the whole thing would probably be hardware related.
Hello Harald,
I reverted, as explained in the edit above, the uint stuff. And the problem is still there. As it appeared at the same time I thought that it was related. But I am less sure now. It could have been a coincidence.
I am a bit lost. It was working fine some releases ago and I seriously doubt about a hardware problem. It does not happen with bilinear for ex. Moreover, if a "jump" occurs with a certain mesh at a certain X and Y coordinate, it will always happen at the same place. This is a very consistent hardware problem (or EMI) no? ;)
So at least it's not my simple patch (puuh...as I expected, because it would be curious how this could interact).
And it's no EMI for sure :-)
If it is reproducible it's a big step in the direction to be solved.
Being a developer, I would do this and I think you can do this, too (or are you a developer yourself?):
You already have two commits where it works and where it doesn't work, right? (git-)tag them as OK0 and FAIL0.
Then try to find the one that introduces the problem by either going up from OK0 or down from FAIL0. Basically checkout the versions, try that version and (git-)tag it with either OK1, OK2, .... or FAIL1, FAIL2, ... the names and number don't matter, but you have a hint in which sequence you did it in case something happens that's not logical.
Sometimes it is a bit more complicated. If there are many commits between OK0 and FAIL0 you should use bisecting (partition in half) to find the single commit. Always continue between the highest OKx and lowest FAILx. If OK and FAIL are in different paths/branches in the tree, then you have to go back to the junction where both paths are connected and go up from there in the branch where the FAIL lives.
At the end of this process you should have a FAILx right above an OKx, which means this FAILx commit is introducing the bug. A developer may now split this commit into multiple smaller commits, if they are not dependent on each other. Then you could continue the process with those. After this often only a few lines remain and it may be obvious what's causing the problem.
using a good git client may help, I like gitkraken for this purpose, because it's not too complicated
I have also another issue with the 2 last bugfix 2.0: The temperature of the extruder never reaches the defined temp (If it is 200C, it "blocks" at +- 197C and never start to print) As I did not investigate a lot, I did not report this yet. And... yes, I retried with older versions to see if it was an hardware problem... the answer is no ;)
I've seen this too. In fact... I was in the process of trying to figure out why this was happening when I left town.
using a good git client may help, I like gitkraken for this purpose, because it's not too complicated
If we can find the commit that breaks the Z-Axis movement in UBL... We can probably get it patched even though I don't have a machine to use.
@Roxy-3D @hg42 - I have identified 2 versions: One which works fine with Z and one which does not. The problem is that I was not in town/lost them between Nov the 26th and 30th. So I do not have the bugfix 2.0 releases between those 2 dates. Moreover, I started only after Dec the 1th to indicate the code (6e944a4 for ex) in the files I stored. I zipped the 2 versions. You will find them here:
Marlin-bugfix-2.0.x BF34 (01-12 Z prob).zip Marlin-bugfix-2.0.x BF32 (25-11 Z OK).zip
This is not perfect. Sorry for that. It would have been better to be sure that the 2 versions discovered would be just surrounding the problem. If some of you have still the complete versions between those 2 dates, I could recompile them and be more specific.
so you do not use a git client?
There are quite a lot of commits on 2017-11-25 and 2017-12-01 and between, so it is not easy to identify these two.
Interesting: Unfortunately I didn't check that before, but my NUM_DIGITAL_PINS fix repaired a bug introduced just one version (or two) before: 7576ad7fc25200659af9e99288affa4a8dc6ff33. The former int16_t was changed to int8_t. That explains, why this wasn't noticed before (I couldn't imagine that such a severe bug could hide below the RADAR).
Ok, back to ubl...
The only pure UBL change between the two versions you zipped is in ubl_G29.cpp and comes from e48fcad615092e34ae0dcf2e08753fb03f525b8f. This looks rather unsuspicious, because only names were replaced.
I tried to find the commits: OK is d29cb646e3c5d3445d95cc31314b684b62cd19cf prob is dd1b503f64b907e38929c90b692d11891ef28b19 (my PR for NUM_DIGITAL_PINS)
There are still about 30-40 commits in between.
@Bergerac56 can we contact by email? my (spammable-)email is on my profile. If you cannot use a git client, I could try to send you several zipped versions. But that doesn't make sense here in a repo (which is used to avoid this).
The only pure UBL change between the two versions you zipped is in ubl_G29.cpp and comes from e48fcad. This looks rather unsuspicious, because only names were replaced.
And this was done just to get out of the way of one of the 32-bit HAL's (ST I think) that was using those names...
@Roxy-3D - Hello Roxy. As the "Z problem" is still there, I tried to narrow down where and when the problem started. As I was only using downloaded versions of bugfixes (and no git client) I started to use gitkraken as suggested by @hg42 (He is an incredible teacher by the way :) ) and made a patient check, commit by commit.
If I correctly did my job, the issue started on Nov the 29th (22:30 ef2531 8 files modified) when Scott added an option to segment leveled moves. All previous versions were working fine. None of the following one have a correct Z move. At least on my printer (RE-ARM)
I did not look at the changes yet.
Hope this helps
If I correctly did my job, the issue started on Nov the 29th (22:30 ef2531 8 files modified) when Scott added an option to segment leveled moves. All previous versions were working fine. None of the following one have a correct Z move. At least on my printer (RE-ARM)
Do you have an 8-bit board you can try things on? Can you plug in an AtMega-2560 board and try things? The reason I'm asking is I wonder if we have found a 32-bit problem. We are not hearing about this problem on the 8-bit side of the world.
One thing I'm not sure if it is ok in that commit is this:
+ const float xdiff = rtarget[X_AXIS] - current_position[X_AXIS],
+ ydiff = rtarget[Y_AXIS] - current_position[Y_AXIS];
+
// If the move is only in Z/E don't split up the move
- if (rtarget[X_AXIS] == current_position[X_AXIS] && rtarget[Y_AXIS] == current_position[Y_AXIS]) {
+ if (!xdiff && !ydiff) {
planner.buffer_line_kinematic(rtarget, _feedrate_mm_s, active_extruder);
return false;
}
I'm not sure you can treat a floating point number as a boolean value. That doesn't seem right.
@thinkyhead Can you take a look at this commit and see if you see anything?
C++11 §5.3.1/9:
The operand of the logical negation operator ! is contextually converted to bool; its value is true if the converted operand is false and false otherwise. The type of the result is bool.
so it's converted to bool and then inverted.
meaning it depends on static_cast
§4.12/1:
A prvalue of arithmetic, unscoped enumeration, pointer, or pointer to member type can be converted to a prvalue of type bool. A zero value, null pointer value, or null member pointer value is converted to false; any other value is converted to true
other sayings:
A float will be converted to false if its exactly 0.0f, It will be also true if its not exacly 0.0f!
so it seems to be identical to the code before. But because it's not totally clear for everyone (even myself programming C++ since 1999) I would use x != 0.0 to be really sure it's what I mean and anyone can interpret it like that.
However, it may also be questionable, if it's the identical value (which means it was copied from the other compared value before and did not receive a calculation in any way).
Otherwise if it's a result of a calculation, it would demand something of the form
fabs(x) < epsilon
because it is never ever 100% sure that it's a completely identical value for float after any calculation.
also, I know compilers that make errors (or their developers) if something is not totally clear in it's definition. I tend to help the compiler by being more explicit instead of relying on an automatic conversion. That said, I clearly like the !x notation for bools, but I never used it for float and also tend to avoid it for numeric values.
there is another thing I want to mention: please change UNEAR_ZERO to something else... for a non-native English speaker like me (I am German) this sounds like un-near-zero, which may be unusual in English (didn't find it very often, but I am not alone: http://www.yourdictionary.com/unnear https://findwords.info/term/unnear ) but it is quite normal in German (unnormal = not normal, Unsinn = nonsense) and this is the opposite of what the surrounding comments claim. I assume this means unsigned-near-zero, but everytime I read it I am shivering.
could this commit be split without breaking functionality of each part? a branch with these parts would allow @Bergerac56 to find the one that causes the issue.
a0fc5f7b52e677cafbb99bb215e97be4ad4e590f in bugfix-1.1.x has the same comment but does not change the same files...is it a kind of import from 1.1.x to 2.0.x?
EDIT: wow -- I'm sorry, I didn't expect motion.cpp et al. to be all part of Marlin_main.cpp -- congratulations to the one who had to split this monster :-) good job...
@Roxy-3D - Yes I can switch to a 8bit board (mega2560) quite easily. I will do it tonight. I have an heavy day today.
Yes I can switch to a 8bit board (mega2560) quite easily. I will do it tonight. I have an heavy day today.
That will be a big help. If the problem doesn't show up there when you do the same things, it is a probably a problem in the 32-bit branch. And most likely it isn't a 32-bit compiler issue. Most likely it is some logic is different between the two branches.
@Roxy-3D Hello Roxy. I reconfigured the printer to use a 8 bits controler (Mega2560) and compiled the last version of bugfix for it. (To start from somewhere)
Bad (or good?) news, I get the same issue.
The way I tested: Power ON and connection to host (pronterface) G28 M502 / M500 / M501 / G29 P1 (so I have an adapted mesh for the eeprom version and for the type of processor) / G29 S1 / G29 L1 / G29 A / M500 G1 X50 (to put X somewhere. It does not matter but it has an influence on where the problem will start along the Y course) G1 Y0 (with my fingers on the Z axis) G1 Y190
So I can feel the very tiny movements of Z. After a certain travel of Y (Let say 1/2 of the lenght) I get "jumps" of Z. For a certain X position, always at the same places. In other words: a smooth move of Z till a certain Y point, and then "steps" (like stairs)
When a mesh is "empty" (full of 0) there are no jumps anywhere (and no Z movements). Seems obvious and expected, but it could not have been the case if hardware problems for example.
On the 32 bits version and with the exact same mesh, no problem before Nov the 29th (ef253) but well after.
I suspect that the problem started at the same moment than with the 32b version as described in a previous comment. I do not have the time tonight to check.
As you notice it when you listen very carefully (the small jumps of Z at full Z speed) or when you have your fingers on the Z axis, it could be understandable that not so many people have noticed it (Or I would be the only one...)
Do you want me to test something else? If not, I will reconfigure for 32 bits.
no problem before Nov the 29th (ef253) but well after
the commit identifier must be 6 characters, otherwise we cannot find it. Easiest is to use gitkraken (or other guis) and click on the number in "commit: xxxxx", if you paste that here, it is very long in the editor, but shortened to the 6 characters but we can click on it.
ef2531558cab8317a772ac766ae5243d742fc89a (interesting, it's shown with 7 characters here, and I searched in the wrong repo, so I couldn't find it :-) )
I reconfigured the printer to use a 8 bits controler (Mega2560) and compiled the last version of bugfix for it. (To start from somewhere) Bad (or good?) news, I get the same issue. The way I tested:
Very good! If it is happening on the 8-bit boards, it is going to be easier to figure out the problem and fix it. I have a quick question: What was X when you did that sequence to cause the problem? Was that still at 0 from the G28?
Update: It looks like X is at X_BED_SIZE/2 when you are doing these Y moves. Is that correct?
#define Z_SAFE_HOMING // Jacbot
#if ENABLED(Z_SAFE_HOMING)
#define Z_SAFE_HOMING_X_POINT ((X_BED_SIZE) / 2) // X point for Z homing when homing all axes (G28).
#define Z_SAFE_HOMING_Y_POINT ((Y_BED_SIZE) / 2) // Y point for Z homing when homing all axes (G28).
#endif
@Roxy-3D - No, not only. (and sorry again for this long explanation: As we say in french: I wrote you a long letter as I did not have the time to write a short one ;) )
During my tests, the center of the bed is only one of the positions of X I tested. I tried with a lot (all?) values of X provided that X and Y were values inside the mesh. Let's re-explain:
Along the X axis (when only X moves, nevertheless the (fixed) position of Y): no problem Along the Y axis (when only Y moves): problem after a certain position along the course of Y. This "certain position" is variable. This means: for a given position of X, it starts always at the same position along the Y course (this is why I am nearly sure that it is not EMI and not another hardware problem as the problem started at a certain release and desappear when I downgrade). But when using another position of X, the issue starts at another value for Y (Always the same for that given X position) => At X=50, it could be Y=100. At X=150, it could be Y=110. Etc... (This is only examples) Obviously if both axis moves (Starting fox ex. from X=20,Y=20) => G1 X250 Y190. The problem starts also "somewhere" along the diagonal.
(300,200) x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x o o c o o o o x x o o o o o o o o o o o o o o o o o o o o o o o o o o o o o (0,0) o= Z OK x= Z not OK c= center of the bed Once again, this is only to reflect my comment. The positions of the o and x are pure examples.
There could be a "pattern" in the positions of "Y problem" related to a specific X value, but I did not have the time to investigate it. The only interesting thing is that for low values of Y (Below +- the "center" of the bed) the Z moves are normal. The issue starts after around 1/3-1/2 of the course of Y (depending on X position). To be more precise, I have the feeling that Y=Y_BED_SIZE/2 is not far from the starting transversal of problems for Y.
When I test, I first position X (G1 Xxxx) So X is still. Then I let Y move (when Y=20 (to be in the mesh for sure) => G1 Y190 for ex) So I have X still and Y travelling to its position. It is along this movement, when Y reaches a certain position, that the issue starts.
EDIT: With the 8bit mega2560: First jump at: X=30 Y= 120-121 X=141 Y=147-148 X=200 Y=171-172 X=250 Y=95-96
@hg42 Sorry Harald. Bad copy/paste. Still learning :) ef2531558cab8317a772ac766ae5243d742fc89a
because we are still stuck, I would try to trigger different thoughts by exchanging things. Let's see if the behavior also changes when we change something.
Because you have a cartesian it should be easy to switch X and Y (including endstops). Either the hardware or in configuration (btw. this would be easy on Smoothieware, just configure different pins, I think in Marlin you have to change a distributed file and/or add your own printer, right? or can I override a pin in Configuration.h?). Does the behavior switch to the other axis?
Another hint could come from inverting the axes. Next, invert the motor by hardware (turn connector 180°) and invert the direction in config (INVERT_X_DIR). Does the behavior invert, too?
Without knowing anything about UBL... Which influence has the position of the MESH points on the Y positions where "jump"ing starts? Only change one thing at a time, e.g. MESH_INSET seems to be the margin in mm (or is it only a logical value to enable it?). If you add 20mm to MESH_INSET the MESH should be moved 20mm in X and Y, right? (but it also shrinks in size by 40mm each, perhaps change MESH_INSET by (X_BED_SIZE-2*old_MESH_INSET)/GRID_MAX_POINTS_Y and reduce GRID_MAX_POINTS_Y by 2) How does the behavior move? take into account, that X of the MESH is also changed.
more ideas welcome :-)
@hg42 - Here I am a bit lost...
If I understand well, you ask me to change/switch hardware elements (and to adapt firmware to adjust those changes). I will not try that on the printer because I do not want to modify the cabling (took a lot of time to take into account all possible EMI when switching to RE-ARM) But I have still a set of all components which I could use to simulate a printer.
However, I am not convince that it will teach us a lot. For Marlin, those hardware adaptations will be quite invisible. (A motor is a motor and a switch is a switch). The hardware seems (let's be prudent) OK as I can revert to a Z problem free situation by downgrading. And, moreover, linear and bilinear leveling are working fine.
I tried already different size of meshes and more or less values in a mesh (12x12 / 5x5/10x10/ 3x3/ ...) .The issue is still there.
Roxy knows very well UBL. My secret hope is that it will ring a bell…
In the meantime, I will revert the printer to RE-ARM and switch to bilinear leveling. We are approaching Xmass and I need to print ;)
@Roxy-3D I tried to clarify a bit where the problems occurred. On a RE-ARM with the last version of bugfix2.x.0 and the correction of @ejtagle for the LCD issue, I made a mesh of 10x10 and tried to estimate at best (very small movements could have been not felled) where the issues occurred for Y along the x axis. The positions noted are always the point just before the issue. The file contents also a G29 W and a G29 T1. It seems that there is a pattern. The missing points could be because the values of the mesh are too close of zero or the jump too difficult to feel. Z Jumps.zip
Another remark: most of the time (but not always) the jump goes in the opposite direction than the smooth move. Between two jumps (when a first jump occurs another one can happen later) smooth movements can happen.
I cross my fingers because I love UBL :)
I'm not able to fully re-read this thread right now. I have to run out the door. But...
@Bergerac56 As you know, there have been some changes to the Planner lately. Have you turned off
// For Cartesian machines, instead of dividing moves on mesh boundaries,
// split up moves into short segments like a Delta. This follows the
// contours of the bed more closely than edge-to-edge straight moves.
#define SEGMENT_LEVELED_MOVES
#define LEVELED_SEGMENT_LENGTH 5.0 // (mm) Length of all segments (except the last one)
It is very possible that is causing the problem. But if not... I'll be home in 5 or 6 days and have a printer I can use to debug. I'll try to resolve (and cure) your issue as quickly as possible.
@Roxy-3D Hello Roxy,
IT WAS IT :) I turned off the 2 options. As a workaround, it works. No jumps anymore. BTW I do not know how I forgot that it was possible to disable the segmented moves.
I suppose that it means that there is still an issue with segmented moves for cartesian printers, but there is a temporary solution.
Thanks Roxy
@Bergerac56
I tried already different size of meshes and more or less values in a mesh (12x12 / 5x5/10x10/ 3x3/ ...) .The issue is still there.
I was interested in how it would change, while being sure it still would fail. Changing axes or directions was simply the easiest to do (if your mainboard is not enclosed, I guess mine will never be...). I admit, I didn't think about it very much because it is only switching four connectors...but you did what I thought you would, if you don't want it, you don't do it :-) I already noticed, you are a "thinking first" personality...
@Roxy-3D I like your commitment...
@thinkyhead I think we have a UBL issue with the new SEGMENT_LEVELED_MOVES. It is verified up above: https://github.com/MarlinFirmware/Marlin/issues/8684#issuecomment-353733519
I don't have any way to check things out in more detail right now. But I can help with debug in 5 or 6 days. Maybe we should make SEGMENT_LEVELED_MOVES off by default???
// For Cartesian machines, instead of dividing moves on mesh boundaries,
// split up moves into short segments like a Delta. This follows the
// contours of the bed more closely than edge-to-edge straight moves.
#define SEGMENT_LEVELED_MOVES
#define LEVELED_SEGMENT_LENGTH 5.0 // (mm) Length of all segments (except the last one)
I believe we patched some code related to segmentation, so maybe it's better now? I'm still trying to fix an issue where the last segment added to the planner doesn't get chained, but that affects every movement in Marlin.
@Bergerac56 have you tried latest bugfix 2.0?
@boelle. Obviously ;) I test nearly every 2 to 3 days.
This is a very old issue. To be sure, I reran the tests done at the time with today's bugfix. The issue is not there anymore (And perhaps since a long time...)
I close the issue
Merci :-D
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
With UBL, the corrections imposed by the mesh let Z adapt smoothly all along the moves of X and Y
With the last bugfix (aeb5c62) and RE-ARM, Z is "adapting" smoothly all along the X axe. But along the Y axe (mainly when Y > half of the course) Z starts to "jump" from one position to another. It moves like "per steps". This seems to affect only Y.
This behavior appeared when the fix made by @hg42 (constexpr uint8_t NUM_DIGITAL_PINS = COUNT(pin_map);) was tested and implemented. My feeling is that this fix is not the cause, but reveals another problem.
Is there somebody else having noticed this behavior?