MechaSteve commented 7 years ago

I have started looking at porting grbl to an ARM M4 platform and I have found a few clear area to improve overall portability. I think a good way to port to various platforms would be to use the same .h for the hardware dependent modules and limit all changes to the .c implementation. (Aside from the cpu_map.h file)

DISCLAIMER: these are my very initial findings and will almost certainly be revised, modified, appended and/or discarded. I am in no way an expert on any of this. (or anything really :) )

GPIO Init: The current code has lots of very similar GPIO initialization scattered throughout. There should be a set of general purpose GPIO initialization functions. These should use macros defined in the form FUNCTION_PORT, FUNCTION_BIT . The init functions are not time critical, and can take a little bit of time for simpler logic. There should be GPIOinitInput( ulPort, ulPin), GPIOinitOutput( ulPort, ulPin), and lightweight GPIOset, GPIOclear, and GPIOtest macros for writing and reading individual pins.

Operator Panel / Control Pins: These should really get their own operator.h/operator.c include module. The .h should define a bit packed control word and a getControlWord(void) function.

Interrupts: Reorganizing all ISRs into one include could also simplify portability. The .h should define which interrupt events should call which functions in other modules. i.e. The pin change interrupt for any of the limit switches should call a function OnLimitEvent(uint8 limitState) in the limit module, and then clear all pending limit interrupts. All pin change interrupts can funnel down to the same ISR which can then select a specific handler based on pin mappings.

Serial: Similar to the operator panel, it would be good to separate the platform dependent peripheral functions from the agnostic buffer functions. This would also simplify future modifications using USB, networking, or other communication interfaces.

Stepper.c As expected, this is the largest concentration of HW dependent code. The better the interface is to this module, the easier it will be to experiment with different algorithms and other motor types.

Atomic Set/Reset: Refactor these into a HW specific nuts and bolts area. Many Arm processors have a bit-banded memory that allow for true atomic set/clear of individual bits.

_delay_ms / _delay_us : These should be eliminated and replaced with event, polling, and interrupt driven logic.

chamnit commented 7 years ago

@MechaSteve : Thanks. All good stuff and have already come to the same conclusions. There has been a lot of offline activity the past month. When we're ready, there should be some interesting surprises coming soon. Anyhow, as a long term thing, I'm busy re-writing the code base to eliminate most of these issues to make sure this is all sustainable longterm.

MechaSteve commented 7 years ago

Very excited to see what you have in store.

An impressive amount of the code base does work unchanged. Obviously there are some potential optimizations in terms of 32bit vs 8bit datatypes.

I made a CNC breakout board for the Stellaris launch pad a while back, and I am mostly just trying to get the minimum functionality working on it.

chamnit commented 7 years ago

@MechaSteve : Well, the cat is out of the bag, so to speak. Here's a video of Grbl v1.1 ported to an LPC processor aka SmoothieBoard. As expected, the performance is bordering on ridiculous. Nearly an order of magnitude better than Smoothieware on their own board. The devs (Todd and Brett Fleming, aka. LaserWeb and JSCut) were able to port and test it within a two weeks. The code is available here.

langwadt commented 7 years ago

on AVR floats and double are the same size, may want to address that on ARM where it isn't. i.e. use sqrtf() instead of sqrt() and add stuff like: -Wdouble-promotion, -Wfloat-conversion and possibly -fsingle-precision-constant to the compiler options

usbcnc commented 7 years ago

@langwadt could not agree more.

MechaSteve commented 7 years ago

One issue I just found that should get fixed in the master branch:

mc_parking_motion: Declaration and implementation in motion_control.h/c should be enclosed in an ifdef PARKING_ENABLE directive.

I guess the AVR compiler ignores this code because nothing calls it, but it should be optioned out. It threw a small error as I was working on my port.

terjeio commented 7 years ago

Here is my take on a HAL for GRBL 0.9 - for TI TM4C123GH6PM (Tiva C), I have separated out all HW dependent code in a separate project (driver) and I am using function pointers for binding the driver code to GRBL - I do not like a zillion #defines. GRBL HAL for TM4C123.zip Not 100% complete and not tested on a real machine - but initial testing looked promising. IIRC interrupt latency went down to 3uS when running the CPU at 80 MHz.

MechaSteve commented 7 years ago

I have my initial draft at https://github.com/MechaSteve/grbl_tiva

I ended up creating inputs.h and outputs.h to handle configuration, reading, and writing. I also tried to keep all pin inversion in these functions. I also replaced all pin based bitmasks with a single generic axis bitmask.

On my todo list are: Setup interrupt priorities to allow Step reset to interrupt Step interrupt, and step to interrupt serial. Rewrite serial functions to utilize full depth of FIFO. Create frequency modulation for Spindle to support PowerFlex525 pulse train input. Test limit switches and other I/O. Test with full mechanics

chamnit commented 7 years ago

@MechaSteve : Awesome! Thanks for doing this. There are quite a few ports now, including some in the past. I plan on spending some time reviewing everyone's porting methods so I can get an idea of what's required to make a "Gnea" core. So, each port would just need to plug into the core. This would free up some of my responsibilities and delegate the porting work to individual port maintainers.

As for me, the SAMD21 port is operational with serial, a stepper timer, and I/O pins. Next is testing out native USB serial to see how much CPU cost there is and EEPROM emulation. Getting the native USB integrated in has been weird because it crashes/disconnects whenever I enable the stepper timer callbacks. Anyone have any ideas why? Spent most of yesterday trying to figure this one out.

MechaSteve commented 7 years ago

I know there are some response time requirements for USB with some control commands.

Check the documentation for the library you are using. The ones I have used before (Microchip PIC18) included warnings about minimizing high priority interrupts.

MechaSteve commented 7 years ago

I think my other major observation on porting grbl is that interrupts need to be abstracted a little.

My use of functions for GPIO write and read remove a lot of hardware specific code from the Step ISR. Aside from the IO pins and resetting the timers, very little else required any modification in the ISR. I think the following functions would allow for a platform agnostic OnStepStart() callback function.

StepDirSet( long lDirectionMask)
StepDirClear()
StepSet( long lStepMask)
StepClear()
StepTimerStart( unsigned long ulTimerPeriod)
StepResetTimerStart( unsigned long ulPulseWidth) for delay -StepResetTimerStart( unsigned long ulPulseWidth, unsigned long ulDelay)

Similarly there should be an OnStepDelay(), and OnStepReset() callback.

One area that was also tricky to port was how the printString, serial_write, ring buffer, and serial FIFO all interact with the serialTX ISR. Part of the issue is that the TX interrupt is edge triggered, and therefore not immediately called when the interrupt is enabled (assuming the FIFO is empty). This requires manually calling the OnSerialTX() callback to start the TX cycle. The other issue is that it is immediately called after writing to the FIFO. ( if the TX was idle, once the byte is written, it is immediately removed from the FIFO to the shift register, creating a TX_Empty interrupt) The ring buffer is updated after moving data to the FIFO, therefore the first byte gets loaded to the TX FIFO twice. I fixed this by moving the tail before writing data to the FIFO, and checking if the buffer is empty at the beginning of the callback. This does mean that the callback is executed one more time at the end of the buffer to stop the TX cycle. I think I may still have an edge case where the ISR can miss the data. It may require a bSerialTXRunning flag.

tbfleming commented 7 years ago

It's easy to introduce a race condition into the FIFOs, especially if both an interrupt and non-interrupt pop from it. You may have to switch from mutable to atomics and pay extreme attention to acquire-release semantics.

langwadt commented 7 years ago

@MechaSteve I haven't used a tiva but I've used quite few timers in my time and the way you use the timer in the stepper interrupt looks wrong to me. If it is anything like most timers, when you disable the timer you have basically lost track of time, i.e. your period will vary with how long it too to enter the interrupt and disable the timer and how long it took before you reenable the timer.

MechaSteve commented 7 years ago

@tbfleming Yes. Another solution I considered is to create an additional else branch in the UART interrupt handler that would also call the TX callback. Then I can use the IntPendSet function to force a software interrupt to the UART handler. This way the callback cannot be interrupted/preempted. I think that will avoid potential issues.

@langwadt correct. The code is a little messy from my efforts to debug why the step outputs were not turning on. (short version: I didn't properly enable the GPIO port. >_< ) for Timer1A, this does not really matter, it is just the delay and pulse width. Timer0A should be operated slightly differently. The timers on the TIVA have the option to either immediately load the reload value upon a write to the register (default), or to wait until the next timeout event to load the value. The correct method would be to start the timer by loading the first timer period, with it configured to immediately reload, and then enabling the timer. Then, change the configuration to only reload on timeout each successive period change. I need to make sure this would work out correctly with step counting.

gerritv commented 7 years ago

@MechaSteve Perhaps take a look at DMA with Serial port TX? Would save dealing with the complexity of interrupts, TX start etc. https://sites.google.com/site/luiselectronicprojects/tutorials/tiva-tutorials/tiva-dma/understanding-the-tiva-dma has some good reading on the topic. Back in the Z80 SCC days we preferred DMA over interrupt driven.

MechaSteve commented 7 years ago

@gerritv AFAIK, all TX strings from grbl end with a line feed. This could be used with a fixed double buffer to transfer serial messages to the UART via DMA. -Buffer data in location A until you get a line feed. -if DMA is idle, start transferring from A, start buffering to location B

The ring buffer is contained in serial.h, so even large changes to the transfer method (USB buffers) should only require modifying this area. This version, so far, only uses a small fraction of the available RAM; so, there should be no issue creating a large enough buffer area.

gerritv commented 7 years ago

I would point the DMA to the message, no need to copy the stuff around. Might need a queue of message ptrs/lengths for the DMA completion handler to work on. I found it fascinating that the Tiva (and perhaps others) have gather/scatter DMA. We used this technique extensively in the Univac CS/P processor for serial communications and printers in the 1970's.

MechaSteve commented 7 years ago

That could work well for having a large table of error messages in flash rom, but feedback kinda needs to be dynamically assembled in ram. I'm also more of the opinion that error messages should be codes that refer to the documentation. That is what most industrial hardware does, that way the GUI can take care of any localisations (and fixing typos).

chamnit commented 7 years ago

@MechaSteve : Turns out it was the SparkFun SAMD21 dev board causing the USB crashes. Not exactly sure why. The Arduino Zero works perfectly.

terjeio commented 7 years ago

My TI TM4C123GH6PM HAL-port is now updated to 1.1f and partly tested outside of my CO2 laser engraver - motors are running and I/O is working. GRBL co-resides with my own code for "pixel perfect" image engraving (no GCode involved) and even shares basic driver routines with that, done by routing calls via the HAL function table. My homegrown driver card has two 24-bit hardware counters to keep track of X and Y position, they are used by a separate processor to provide PPI-mode for cutting - no GRBL driver code involved . PPI-mode is another (better?) alternative for controlling the laser power delivered to the cut. Next - real life testing. BTW: a little more info about my laser can be found over at buildlog.net

chamnit commented 7 years ago

@terjeio : Thanks for sharing! BTW, I'm having trouble tracking down your code. Could you post it somewhere? I'd like to see how you did your HAL, as I'm formalizing this right now and would like to gather more ideas and concept before releasing it.

MechaSteve commented 7 years ago

@terjeio : I'm curious, because your use of descrete timers is similar to a solution I have contemplated. Did you explore connecting the Step pulse output to the counter input of a full 32 or half wide timer/counter module?

Also, for PPI mode did you consider creating a virtual 3rd axis, where each motion moves an equal distance in the z axis and xy plane?

A pulse per inch mode could also provide a very good way for me to create feed per tooth with the VFD I am using.

terjeio commented 7 years ago

@chamnit : Here it is - the driver part will be seen incomplete as it relies on code from my engraver, I will not post that now as it will only confuse. I have started to convert my standalone library as well, can post that later if of interest. My basic idea is that the core GRBL code should not contain any hardware dependent code, that part is abstracted out and accessed via a function table (hal.h) - main.c is replaced by grbllib.c as the entry point (I am using GRBL as a library). Using a function table makes it easy for me to switch between my own engraving routines and GRBL without duplicating code. I have started to experiment a bit with bit-banding, I am using that for controlling the motors now - some changes has to be made to GRBL to use that for the internal flags though. GRBL Driver.zip GRBL Library.zip

Edit: bad spelling...

terjeio commented 7 years ago

@MechaSteve : I think it is not possible to reliably count step pulses via timers, you have to ensure count direction follows the direction pulses. Well, at least not when the pulse/dir signals originates outside the MPU, my card is designed to work with Mach3 as well. I created my (latchable) counters in VHDL (CPLDs) and I am using a MSP340G2553 processor to handle the PPI generation, the processor constantly polls the timers and calculate the traveled distance (hypotenuse - "as the crow flies") and uses that for generating the laser pulses as long as the laser is switched on. I am not sure what you mean by a virtual 3rd axis involving Z - im my implementation Z (laser on/off) is either 1 or 0 and it gets chopped up by the PPI processor as long as it is 1 before it is fed to the laser tube. I would not think this approach is precise enough for milling/lathe work.

chamnit commented 7 years ago

@terjeio : Great thanks! Looks like you are using function pointers to manage the HAL. I'm doing a little different approach with macros and inline functions instead. I'm assuming that it'll be more efficient this way. How much more efficient, I'm not sure. Either way, the approach is very similar.

terjeio commented 7 years ago

@chamnit : I have not checked the cost of inlining versus function calls, the cost for indirecting function calls via pointers versus calling them directly is very little - only a few cycles IIRC. Maybe the biggest advantage of using my approach is maintainability - the driver code is completely separate from the main GRBL code? And no macro coding to struggle with? It may also be easier to document and understand the "contract" between GRBL and the driver when function calls are used. But then I have not seen your approach - I am no expert in coding with macros...

chamnit commented 7 years ago

@terjeio : Yeah you're probably right about the function pointers, especially on a fast MCU. The overhead is minimal. Either way, the macros can support your function pointer methodology easily by defining the macro as your function pointer calls. Even though it's a bit harder to read, it's a lot more flexible in what you can do overall.

As far as a driver contract, I think I will need the help of other users porting Grbl to get an idea what that contract should be. With the SAMD21 port, I'm learning very quickly that it's difficult to have a universal HAL, but that doesn't mean we can't have something close with a little bit of work and refinement.

tbfleming commented 7 years ago

Inlining isn't about saving a couple cycles; it's about enabling optimization. GCC's and clang's major optimization strategies depend on it. Function pointers and C++ virtual functions get in the way.

langwadt commented 7 years ago

@terjeio depends on what kind of timer, using the quadrature input on the timers in an STM32 I've hacked grbl to "run" (just on the table not on a machine) with two DC motors with encoders as servos. Unfortunately it seems you can only get two 32 quadrature counters so getting three axis would require some tricks. Steppers and encoders with the occasional sanity check of whether steps and encoders counts agree would be a nice feature

terjeio commented 7 years ago

@chamnit : I think the HAL struct I made can be a good starting point for the driver contract - it has potential signatures for all hardware related calls collected in one place. The difficult part may be how to handle different (timer) clock frequencies - IIRC there are some calculations related to that that are close to overflowing now. Some minor code changes to enable the use of bit-banding would be nice.

@tbfleming : Isn't optimization mostly about saving cycles? The problem about available memory for code is not an issue any more I would believe - my port uses 55K of the 256K available. Has anybody defined a design goal for performance and tested for that? Or more is always better? ;-)

@langwadt : I did not think of using the quadrature inputs as an option, clever idea - noted for later.

tbfleming commented 7 years ago

@terjeio my point is that inlining goes way beyond saving just a couple cycles; they open the door to much greater savings. Inlining is so important to modern compilers that they don't trust users to to get it right. They ignore inline hints and use complicated algorithms to decide what to inline instead. Function pointers get in the way of that. -Os vs. -O2 vs. -O3: the best choice varies with app; you have to do performance testing to find the right one. -Os is sometimes the fastest.

tbfleming commented 7 years ago

bit-band violates the C and C++ memory model, which assumes a write to one object (memory location) doesn't affect any other. Compilers rely on this assumption when reordering instructions. To get around this, you'd have to declare both regions volatile, which hamstrings the optimizer. You could easily kill performance using bit-band.

langwadt commented 7 years ago

@tbfleming you expect bitband to be worse than having to wrap every access in protection?

terjeio commented 7 years ago

@tbfleming : so bit banding is a no-no then? Even for setting/clearing bits atomically that otherwise requires you to disable interrupts, do a RMW sequence and reenable interrupts? And why flagging both regions volatile if one is only accessing data via the bit band region - is that really needed? And I am not sure what you mean by "kill performance" - you mean rendering the program unusable?

langwadt commented 7 years ago

with out some examples I'm reluctant to believe the bit-banding will slower on something that needs to be volatile and require protected rmw if not using bitband

tbfleming commented 7 years ago

Even for setting/clearing bits atomically that otherwise requires you to disable interrupts, do a RMW sequence and reenable interrupts?

Not where it would matter. The only functions which do this are systemset and systemclear. These aren't called at a high frequency.

And why flagging both regions volatile if one is only accessing data via the bit band region - is that really needed?

If you're only accessing data via 1 of the two regions, then you don't need bit band.

And I am not sure what you mean by "kill performance" - you mean rendering the program unusable?

2 cases:

used in low frequency code paths: it's a non-portable optimization that buys nothing.
used in high frequency code paths: it limits the optimizer where it matters most. This can be much worse than saving a single bit op, assuming you're not talking about the disable ISR, RMW, enable ISR case, which is low frequency.

terjeio commented 7 years ago

@tbfleming : I have a lot to learn, you mean having a single piece of code that the optimizer does not like makes it give up on optimizing the whole program? I did not know that. I also believed that disabling interrupts was not very smart when high frequency code paths are dependent on interrupts beeing enabled. Hmm... maybe I should do some testing to check this out.

tbfleming commented 7 years ago

@terjeio

you mean having a single piece of code that the optimizer does not like makes it give up on optimizing the whole program

No, it's a bit more complicated. Take this example:

a = 0;
b = a;
if(b)
    foo();

If a and b aren't volatile and aren't atomic, then the compiler can safely remove the if check and the call to foo. If either one is volatile or atomic, then the compiler must preserve these. Optimizations build on each other, making it hard to predict when an unnecessary volatile or function pointer cause major performance headaches.

I also believed that disabling interrupts was not very smart when high frequency code paths are dependent on interrupts beeing enabled.

It depends. E.g. disabling an interrupt for 10 cycles on GRBL-LPC means adding a 0.1 us jitter to the step train. That's OK since GRBL-LPC's max step rate has a 5.0 us period. GRBL-328 has a lower step rate and can tolerate much longer periods where interrupts are disabled. Since the 328 doesn't support nesting interrupts, interrupts are disabled during the entire ISR(SERIAL_RX) function.

MechaSteve commented 7 years ago

Many of these issues were my reasons for simply using a standard .h file to create a HAL/API/driver.

Each header can contain function prototypes and clear and complete documentation of what the function should do, but not how the function will do it. In this way, all hardware can use the same header files.

For example inputs.h can contain functions like "void LimitSwitchInit(void)" and "tAxisMask ReadLimitSwitches(void)". Similarly a function prototype like "void SetSpindleSpeed( float RPM)" need no knowledge of how this is accomplished; it may be a PWM output, a pulse frequency output, or even a communications connection to another controller.

Each hardware implementation would have it's own set of .c files, and any additional .h files used by them. The .h files provide an API interface contract by clearly describing what each function should do.

An even more modern approach (in C++) would be to create an abstract base class with virtual prototypes for all hardware functions. I am delighted at the thought of having all the organized elegance of a TivaCNC class derived from a GRBL base class. However, I would worry this would cause a severe performance decrease, as well as less predictable performance.

terjeio commented 7 years ago

Here are the results from a simple test I made, toggling a GPIO pin @ 80MHz CPU clock - time per iteration:

inline: bit-banding: 0.18uS driverlib: 0.40us

fn call: bit-banding: 0.35uS driverlib: 0.58us

indirected fn call: bit-banding: 0.37uS driverlib: 0.63us

No optimizations were enabled in these tests. I redid the indirect calls with -O4 --opt_for_speed=5 and the time per iteration were rougly halved, however I consider this irrelevant as the program is so simple that all variables might end up in registers - this will not be possible (IMO) in a more complex program like GRBL.

Writing the GPIO register directly could improve speed, at least the TIVA MCU has 256 "shadow" registers allowing a simple store to set/clear multiple bits independently, no need to perform a RMW sequence. This could help when signals share the same port.

From the manual: "To aid in the efficiency of software, the GPIO ports allow for the modification of individual bits in the GPIO Data (GPIODATA) register (see page 662) by using bits [9:2] of the address bus as a mask. In this manner, software drivers can modify individual GPIO pins in a single instruction without affecting the state of the other pins. This method is more efficient than the conventional method of performing a read-modify-write operation to set or clear an individual GPIO pin. To implement this feature, the GPIODATA register covers 256 locations in the memory map."

An interesting option for hardware may be the Raspberry Pi - bare metal coding and utilizing the GPU core for time critical code paths could result in a very fast system. See https://github.com/hoglet67/PiTubeDirect/wiki for what can be achieved.

tbfleming commented 7 years ago

toggling a GPIO pin

grbl's step interrupt doesn't work that way.

terjeio commented 7 years ago

Today I have sucessfully made some cuts in the laser proper with my ARM port, however there are some issues I would like to discuss. I hope you do not mind raising them here.

Stepper ISR

For every step there is a call to spindle_set_speed, is this really neccesary? I am using a I2C DAC for setting the output level - and doing that for every step generates a lot of overhead. I could check for change in the spindle_set_speed function but I think it would be better to do it in the ISR - like this?

  #ifdef VARIABLE_SPINDLE
    // Set real-time spindle output as segment is loaded, just prior to the first step.
    if(st.exec_segment->spindle_pwm != st.spindle_pwm)
        st.spindle_pwm = spindle_set_speed(st.exec_segment->spindle_pwm);
  #endif

Stepper pulse generation/reset

What is the reason behind

st.step_bits = (STEP_PORT & ~STEP_MASK) | st.step_outbits; // Store out_bits to prevent overwriting.

?

Is it for preventing overwrite when a new set of step pulses is output before the current pulses are reset? If so isn't that something that never should happen and could result in extra pulses beeing output? In my port I delegate bit inversion to the driver and I will be outputting the bits individually, I am also considering using separate timers for each axis for the reset. IMO this would negate the need for st.step_bits, but I could be wrong.

@MechaSteve : I agree, and I have started to write a template driver that documents the interface I am using. By changing grbl into a library and adding a few entry points I believe it will be possible to add code in the driver for handling displays and buttons and what have you - without contaminating grbl itself with a lot of bloat. As an experiment I have done just that, hardware jogging, power display and I2C relay control for (laser) coolant etc. Not yet working perfectly but close.

MechaSteve commented 7 years ago

@terjeio : 1: I think any arbitration that needs to be done based on current != setpoint belongs in the hardware abstraction. This is especially true if the motivation comes from the overhead associated with the hardware itself. In an industrial controls application, I would normally explicitly set the setpoint every time. This tends to make programs more robust and avoids a few edge cases. For example: -machine entering or leaving a hold condition -machine recovering from an e-stop -overrides to spindle speed or any other situation where the setpoint may be written elsewhere Ideally it should work either way. This way is a little easier to avoid errors.

2: st.step_outbits is prepared by the previous call to the ISR. These bits are overwritten starting at line 386. A better question is: why not just always store these bits in st.step_bits at the end of the ISR?

I agree very strongly that bit inversion should be contained in the I/O driver. Instead of individual writes to bits I used a bit packed format that is always: bit0 : X Axis bit1 : Y Axis bit2 : Z Axis (theoretically extend for A, B, C, or E) This format is used for step, direction, limit check, etc. This leaves anything dealing with port selection and pin numbers in the drivers as well.

chamnit commented 7 years ago

@terjeio :

PWM does not get set every step. It's every step segment, which runs at 100Hz. True, it's not the most efficient, but it guarantees robustness as @MechaSteve said. It also reduces the need for an extra variable to be tracked and the code that you need to check for edge cases.
st.step_bits is required because the step is pulsed after the step pulse delay duration. During which, the main st.step_outbits is being setup for the next pulse.

terjeio commented 7 years ago

@MechaSteve :

I just changed it, before I saw your mail - honestly ;-)

ifdef VARIABLE_SPINDLE

// Set real-time spindle output as segment is loaded, just prior to the first step.
st.spindle_pwm = st.exec_segment->spindle_pwm;

endif

and added st.spindle_pwm to the pulse start function signature

hal.stepper_pulse_start(st.dir_outbits, st.step_outbits, st.spindle_pwm);

which I believe is what you ment (english is not my first language). Anyway, the "problem" is now delegated to the driver - and the call overhead to spindle_set_speed disappeared as well (it is a driver call - internally inlined).

I do exactly the same for packing bits and let the driver handle the hardware aspects. Still, I cannot wrap my head around what st.step_bits is good for - I am getting old and my way of thinking is not what it used to be...

terjeio commented 7 years ago

@chamnit :

I just delegated it to the driver, see above.
I am a bit slow - st.step_bits gets or'ed with the current bits in the output register EXCEPT the pulse bits, sorry about that. IMO then any functionality related to st.step_bits is belonging to the driver domain and can (should?) be removed from stepper.c. This from the perspective that grbl should not impose any constraints on how hardware is configured.

chamnit commented 7 years ago

@terjeio : The entire step pulse generator is being abstracted out. Grbl will only generate the step segment data.

terjeio commented 7 years ago

@chamnit @MechaSteve : - many thanks for the replies. I now believe I have a good handle on what is going on in the stepper driver ISRs - and I believe my implementation for 1.1f is sound, more real life testing will show. Integrating my hardware in a flawless way is the next challenge, I have four I2C devices on my card I need to talk to - and it seems I have to move I2C comms over to interrupt driven...

django013 commented 7 years ago

Hello,

i just happened to discover both arm discussions which i read very excited. Sadly both discussions became muted.

So how are you doing behind the curtain? I'm very interested in the future path of grbl :)

robomechs commented 5 years ago

We can port it to cortex -m4 from stm32f103 grbl port: https://github.com/usbcnc/grbl

gnea / grbl

Portability Experiments (ARM M4) #92

ifdef VARIABLE_SPINDLE

endif