RoboDK / RoboDK-API

Implementation of the RoboDK API in different programming languages. The RoboDK API allows simulating and programming any industrial robot (offline and online)
https://robodk.com/doc/en/RoboDK-API.html
Other
237 stars 117 forks source link

Consideratins on the new definition of MAX_STR_LENGTH in robodk_api_c.h #100

Closed ASE-Optics-Europe closed 9 months ago

ASE-Optics-Europe commented 2 years ago

MAX_STR_LENGTH is now defined as 2048 chars long. Where do these numbers (1024, 2048) come from? This should be equal to the longest diagnostic message that the API is expected to receive. The question comes from the fact that the new definition doesn't solve my communication issue: after having commanded a motion via the API, the API goes into a status -1 (as returned by _RoboDK_recv_Int, line 2156) and thus goes into the condition handled by the lines 2185 to 2188 of robodk_api_c.c (I have waited more than 8000 characters and still havent received the expected '\n' string terminator). There (between lines 2185 and 2188), the API repeatedly invokes _RoboDK_recv_Line, where you have introduced a potential source of bugs: 1) making MAX_STR_LENGTH longer does not address the main issue that _RoboDK_recv_Line never receives '\n' by the SocketRead call on line 2059. As such, isDone never becomes true and the API will always freeze there; 2) lines 2067 to 2069 have been replaced by those between 2064 and 2069 by filling up the output buffer only if i < MAX_STR_LENGTH. This avoids that the if condition writes beyond the end of the buffer (that would be a fault), but does not solve the system freeze: the loop still waits for the '\n' character to be received by SocketRead on line 2059!

My doubt is: what does status variable in _RoboDK_check_status represent? Where are these values (and their meanings) defined? By looking at the code in _RoboDK_check_status I infer that there is a first set of status values between 1 and 9 (both included) representing expected successful execution of the commands received, and then a second set of status values < 100 (where currently also -1 falls) apparently representing error conditions (the buffer strProblems is to be filled with a diagnostic string to be attached to "RoboDK API ERROR: %s"). I wonder if that -1 hanging the API is correctly handled here, it perhaps representing a not better specified communication problem (lines from 2190 to 2192) that needs NOT filling up a buffer (obviously, as the error condition is not better specified and thus probably not even recognised). A way to put it without changing the type definition of status (int32_t on line 2156), is to replace line 2185 with else if ((status < 100) && (status > 0)) { But at this point, the neater way to put it is to have:

The status -1 would thus immediately be sent to 2190: no buffer filling required, no crashing, and error message.

RoboDK commented 2 years ago

Any line sent by RoboDK should end with \n. If this is not the case there must be a bug somewhere (RoboDK API probably, although it could also come from RoboDK itself).

Could you provide a sample project for us to reproduce this issue?

There is no well defined max length for a string. This string could be much longer than 2048 characters depending on how you use the API. The new version should be safe by not writing past this MAX_STR_LENGTH.

ASE-Optics-Europe commented 2 years ago

Hello,

I am providing sample code of where the problem arises, as the project is in LabWindows/CVI, on a machine with maintenance licence expired (thus running RoboDK 5.4.0.21926 with -SKIPMAINT switch).

` typedef enum BinaryLogicValuesEnum {NO, YES} logic_e;

typedef struct azimuth_t { //!< Structure defining an azimuthal orientation in two angles (α, ι). double alpha; //!< Azimuth angle (from X axis, CCW). double iota; //!< Zenith angle } azimuth_t;

typedef struct RobotArmControllerType { WSADATA wsaData; //!< Winsocket data. logic_e socketOpen; //!< Communication sockets to the RoboDK library open. logic_e conn; //!< Robot controller connected [NO | YES] logic_e present; //!< Robot controller presence status [NO | YES] enum eRobotRunMode runMode; //!< RoboDK running mode [SIMULATION | RUN_ROBOT] logic_e running; //!< Controller is running RAPID code char DIval; //!< Digital Input line value } robCtrlr_t;

typedef struct RobotArmToolType { //!< For a robot-held tool, describes the tool load struct Item_t item; //!< RoboDK probe's item instance. char name[64]; //!< RoboDK probe's name. struct Mat_t pose; //!< RoboDK probe's current pose in 3D space in the BCS. struct XYZWPR_t ABBpose; //!< Probe's pose according to ABB reference definitions. azimuth_t oriAzim; //!< Probe's orientation in azimuthal coordinates (alpha, iota) above the reference point. struct Mat_t tcp; //!< Tool TCP's pose in the wrist coordinates' system (centre of the mounting flange on axis 6) struct XYZWPR_t ABBtcp; //!< Tool TCP's pose in the wrist coordinates' system (centre of the mounting flange on axis 6) according to ABB reference definitions struct Mat_t target; //!< RoboDK target pose (to be reached) in the "Home" reference frame. struct XYZWPR_t ABBtarget; //!< Target pose according to ABB reference definitions. azimuth_t targetAzim; //!< Target orientation in azimuthal coordinates (alpha, iota) above the reference point. struct Mat_t filTarget; //!< RoboDK target pose filtered. struct Mat_t transform; //!< Transformation matrix to go from target to filtered target. double jogStep; //< Step (mm or °) to be used for jogging motions double refDist; //< Probe measurement distance (121.22 mm) double offset; //< Probe measurement offset (distance from the fiber exit to the robot arm Y axis) } robTool_t;

typedef struct RobotArmType { robCtrlr_t ctrl; //!< Robot arm controller struct RoboDK_t rdk; //!< RoboDK-controlled robot arm. struct Item_t robot; //!< RoboDK robot item. char name[64]; //!< Controller name robTool_t probe; //!< Description of the probe mounted at the robot arm wrist. struct Item_t ocs; //!< RoboDK Object Coordinates System reference. Used to manipulate the inspected tile. struct Mat_t refFrame; //!< RoboDK reference frame pose for the probe. struct XYZWPR_t ABBrefFrame; //!< RoboDK reference frame expressed in the ABB-style 6 coordinates. logic_e moving; //!< "Arm is moving" status } robotArm_t;

typedef enum RobotArmJog_Enum { //!< Enumeration type defining the different jogging movements ARM_JOG_PSI = -6, ARM_JOG_THETA, ARM_JOG_PHI, ARM_JOG_Z, ARM_JOG_Y, ARM_JOG_X, ARM_JOGX = 1, ARM_JOGY, ARM_JOGZ, ARM_JOGPHI, ARM_JOGTHETA, ARM_JOGPSI } armJog_e;

char svcBuffer[BUFSIZ]; //!< Char buffer used to prepare strings

#define SIGN(A)             (((A) < 0) ? -1 : (((A) > 0) ? 1 : 0))

int armJog(robotArm_t arm, armJog_e motion) { arm->probe.target = Mat_makeCopy(&arm->probe.pose); armGetABBpose(&arm->probe.target, &arm->probe.ABBtarget); arm->probe.ABBtarget.arr6[abs(motion) - 1] += (SIGN(motion) arm->probe.jogStep); Mat_SetPose_KUKA(&arm->probe.target, arm->probe.ABBtarget);

return (armMove(arm));

}

void armGetABBpose(const struct Mat_t pose, struct XYZWPR_t ABBpose) { sprintf(svcBuffer, "Pose in the current reference frame:"); // LOG_UPDATE(svcBuffer, 1); Mat_ToString(pose, svcBuffer, true); sscanf(svcBuffer, "Mat(XYZRPW_2_Mat(%lf,%lf,%lf,%lf,%lf,%lf))", &ABBpose->arr6[ARM_X], &ABBpose->arr6[ARM_Y], &ABBpose->arr6[ARM_Z], &ABBpose->arr6[ARM_PSI], &ABBpose->arr6[ARM_THETA], &ABBpose->arr6[ARM_PHI]); sprintf(svcBuffer, "%.2lf, %.2lf, %.2lf, %.2lf, %.2lf, %.2lf", ABBpose->arr6[ARM_X], ABBpose->arr6[ARM_Y], ABBpose->arr6[ARM_Z], ABBpose->arr6[ARM_PSI], ABBpose->arr6[ARM_THETA], ABBpose->arr6[ARM_PHI]); // LOG_UPDATE(svcBuffer, 5); }

int armMove(robotArm_t *arm) { errReset();

while (Item_Busy(&arm->robot)) {    // Do NOT operate while robot is busy

// LOG_UPDATE("Robot arm BUSY!", 1); }; // LOG_UPDATE("Moving probe to pose:", 1); sprintf(svcBuffer, "%.2lf, %.2lf, %.2lf, %.2lf, %.2lf, %.2lf", arm->probe.ABBtarget.arr6[ARM_X], arm->probe.ABBtarget.arr6[ARM_Y], arm->probe.ABBtarget.arr6[ARM_Z], arm->probe.ABBtarget.arr6[ARM_PSI], arm->probe.ABBtarget.arr6[ARM_THETA], arm->probe.ABBtarget.arr6[ARM_PHI]); // LOG_UPDATE(svcBuffer, 5); Mat_SetPose_KUKA(&arm->probe.target, arm->probe.ABBtarget); Item_MoveL_mat(&arm->robot, &arm->probe.target, true); Mat_Inv_out(&arm->probe.transform, &arm->probe.target); Mat_Multiply_cumul(&arm->probe.transform, &arm->probe.target); armGetPose(arm); return (errNum()); } `

When this piece of code executes, the robot arm structure is already correctly acknowledged and configured, its pose retrieved (armGetABBpose), when the armJog function calls armMove. There, Item_Busy verifies that robot is free to receive commands, then pose is set (Mat_SetPose_KUKA) and finally Item_MoveL_mat is invoked. Both with or without the robot arm connected (i.e., even in simulation mode), the robot (or its simulated counterpart in RoboDK) performs the motion, executes the whole _RoboDK_moveX function but halts at 2225, waiting for Item_WaitMove. In Item_WaitMove, _RoboDK_check_status is invoked at line 400. In _RoboDK_check_status, status (line 2156) is determined to be -1, leading to the execution of lines from 2185 to 2189, where _RoboDK_recv_Line is invoked. There, socketRead is repeatedly invoked, waiting for a '\n' that never comes. Now, if I had this problem with the robot connected, I would first check for proper communication over the Ethernet cable, but this is happening exactly in the same way in simulation mode, with NO robot connected: I see the simulation of the arm move on the screen and then my GUI freezes.

vdm-dev commented 11 months ago

Hi @ASE-Optics-Europe, The problem actually has nothing to do with MAX_STR_LENGTH. Most likely, the implementation of motion commands is not quite adequate. I made some changes in the project and at the same time created a new file Issue N100.c in a separate API branch (https://github.com/RoboDK/RoboDK-API/tree/hotfix/RDK-2651/wrong-status-code). It would be great if you filled in the structure fields in the correct way. Because I only zeroed it and set jogStep:

    robotArm_t arm;
    memset(&arm, 0, sizeof(arm));

    arm.probe.jogStep = 100.0;

Also, it would be great to see a RoboDK test project for your task.

Best regards, Dmitry.

ASE-Optics-Europe commented 11 months ago

Dear Dmitry,

thank you for reaching out after all this time! I actually found out the main culprit of the behaviour, and I am afraid it is a race condition triggered in the compiler I am using (the clang 3.3 coming with LabWindows/CVI 2019). I reconstructed that if I set a small delay (say 0.1 s) at the beginning of a communication function to the controller, all problems vanish. I found this out by mere chance, introducing breakpoints in the RoboDK.c library while running the software in debug mode. I had a breakpoint at the very beginning of a communication function immediately relaying the pointer to a structure to another function: with breakpoint it did work, without breakpoint it didn't. I introduced a small delay (Sleep(50);) as the first instruction and (almost) all problems went away. I checked the value of said pointers and verified that the pointer passed to the second function was not pointing to the same location as the pointer received by the first function, while it was correctly updated with the delay. What I am led to believe is that the C library has been roughly translated from C++, where you don't need to pass pointers to objects explicitly when you call functions: you call the method of an object. That is: you first call the object, and then select its function you want to be executed. In C they had to write functions and pass structures to them. If the structure has not yet been correctly addressed (first line in the called function), you might have a "race" condition and be calling anything else. I believe that makes sense. Please, I can't remember now what the function was (a few months have passed, I forgot to document precisely my findings, and I have been involved in quite a few projects since then), but I wish that this commentis helpful.

Best regards

vdm-dev commented 9 months ago

Thank you for your comment. Since the problem does not appear at the moment, we can close this issue.