Bit-banged serial, baud rates and error

MCUdude commented 4 years ago

I'm working on wrapping@nerdralph's brilliant bit-banged serial code around the known Serial.print(). Or, lazy as I am I'm borrowing @sleemanj's code (Print.h/cpp, HalfDuplexSerial.h/cpp/S) since it's much better than whatever I could have done in terms of memory usage and efficiency.

Since this implementation is a bit different from the "official Arduino" one, I'll provide some additional information in the README explaining what works and what does not. Another thing I'd like to show is a table of supported baud rates for different F_CPUs. However, I don't know how to calculate the error. What I want to do is to create a table with all baud rates that is guaranteed to work (with ~3% error or less). I know that the internal oscillator of the T13 is usually very off, but let's pretend this isn't an issue. ~~Could any of you guys help me fill out this table?~~

EDIT: Tabled filled out by calculating error + testing on real hardware

| Clock & Baud | 460800 | 250000 | 230400 | 115200 | 57600 | 38400 | 19200 | 9600 | 4800 | 2400 | 1200 |
|--------------|--------|--------|--------|--------|-------|-------|-------|------|------|------|------|
| 16 MHz       | X      | X      | X      | X      | X     |       |       |      |      |      |      |
| 16 MHz       | X      | X      | X      | X      | X     | X     |       |      |      |      |      |
| 12 MHz       |        | X      | X      | X      | X     | X     |       |      |      |      |      |
| 9.6 MHz      |        | X      | X      | X      | X     | X     |       |      |      |      |      |
| 8 MHz        |        | X      | X      | X      | X     | X     |       |      |      |      |      |
| 4.8 MHz      |        |        |        | X      | X     | X     |       | X    |      |      |      |
| 1.2 MHz      |        |        |        |        |       | X     |       | X    | X    | X    |      |
| 1 MHz        |        |        |        |        |       | X     |       | X    | X    | X    |      |
| 600 kHz      |        |        |        |        |       |       |       | X    | X    | X    | X    |
| 128 kHz      |        |        |        |        |       |       |       |      |      |      |      |

nerdralph commented 4 years ago

I released a few bit-bang uart versions. Here's the last one: https://github.com/nerdralph/nerdralph/tree/master/avr/libs/bbuart In this version the baud rate is determined at compile time, which I think is fine for the vast majority of use cases. You could probably do something with a macro for Serial.begin() to set the baud rate instead of defining the baud rate before including BBUart.h

At 9.6Mhz, it can do 12,400-192,000bps at <3% timing error. 81N can handle a timing error of up to 1/2 bit-time over 10 bits (start + 8 + stop), or 5%. That would permit up to 320kbps@9.6Mhz. I think 57,6 and 115.2 are the main usecases, so I made sure it worked well for those speeds. To go down to 9600bps, you'd need to reduce the clock speed to 4.8, which I think shouldn't be an issue for most people.

sleemanj commented 4 years ago

Here is a php script to generate the data, I think this is right, or at least, convincing. It's based on the makefile I use for building optiboot which has what should, if I ported it correctly, the same calculation.

I've attached the output in case you can't use PHP.

bc.txt

Note that I would recommend reducing your maximum recommended error to 2%, in my experience, 3% is a bit iffy particularly with CH340

#!/usr/bin/php

<?php
  $CSVOutput = array("AVR_FREQ,BAUD_RATE,BAUD_ACTUAL,ERROR %\n");
  foreach(array(20,16,12,9.6,8,4.8,1.2,1,0.6,0.128) as $AVR_FREQ)
  {
    $AVR_FREQ *= 1000000;
    foreach(array(460800, 150000, 115200, 57600, 38400, 19200, 9600, 2400, 1200, 300) as $BAUD_RATE)
    {
      if(( 8 * (( ($AVR_FREQ + $BAUD_RATE * 4) / (($BAUD_RATE * 8))) - 1 ) ) == 0) continue;      
      $BAUD_ACTUAL = $AVR_FREQ / ( 8 * (( ($AVR_FREQ + $BAUD_RATE * 4) / (($BAUD_RATE * 8))) - 1 ) );
      $Error = (( 100*( $BAUD_RATE - $BAUD_ACTUAL) ) / $BAUD_RATE);
      if(abs($Error) <= 3)
      {

        echo "Frequency               : $AVR_FREQ\n";
        echo "Desired                 : $BAUD_RATE\n";
        echo "Achieved (Theoretically): $BAUD_ACTUAL\n";
        echo "Error %                 : ";
        echo abs(round($Error,2)) . "\n";
        echo "\n\n";

        $CSVOutput[] = "{$AVR_FREQ},{$BAUD_RATE},{$BAUD_ACTUAL},{$Error}\n";

      }
    }
  }
  echo implode($CSVOutput,"");
?>

nerdralph commented 4 years ago

@sleemanj Have you tried my soft uart? Optiboot uses what looks to be a version of the AVR305 soft uart, which has 1 cycle of jitter between 1 and 0 bits, while mine has no jitter. To calculate the true error margin you'd need to include the jitter. Mine has a bit-time resolution of 3 cycles, while Optiboot's is 6 cycles. My 5% error calculation is based on a 30 cycle minimum bit time at +- 1.5. Optiboot is +-3.5 cycles (6 cycle delay resolution + 1 cycle jitter), so the 5% error threshold would be at 70 cycles per bit, or 114.3kbps at 8Mhz. If the average jitter isn't factored in the UART_B_VALUE calculation, Optiboot may have +-4 cycle accuracy, meaning the 5% threshold is 100kbps at 8Mhz.

sleemanj commented 4 years ago

Sorry should have been clear, those calculations are not specifically for any software serial, they are hardware serial so ideal conditions.

I do use a lightly modified version of one of your implementations in my ATTinycore...

https://github.com/sleemanj/ATTinyCore/blob/master/avr/cores/tiny/HalfDuplexSerial.cpp https://github.com/sleemanj/ATTinyCore/blob/master/avr/cores/tiny/HalfDuplexSerial.h https://github.com/sleemanj/ATTinyCore/blob/master/avr/cores/tiny/HalfDuplexSerial.S

nerdralph commented 4 years ago

OK, James, that's the jitter-free version you have in ATTinyCore. You could make a couple small tweaks. Setting the port to output mode could be moved to the Core startup code. Or if you want to be really fancy, you could move the line "sbi UART_Port-1, UART_Tx" to the end of the TxByte function in section .init8. That means it should run before main, and with the right linker magic with gc-sections, will only get added to .init8 when the TxByte function is used. The other little tweak is to change the cli + ret to reti. I mentioned it here rather than opening an issue on ATTinyCore since Hans will probably want to make TxByte ISR safe like you did by adding the sei/cli. I don't know if you do, but I'd put a warning in the documentation about using interrupts and low baud rates. Transmitting serial data 19,200 can add up to 520us of latency to interrupt handling.

MCUdude commented 4 years ago

Thanks for the script @sleemanj! I ran it in an online PHP sandbox. I have yet to try every single option on my ATtiny13 dev board (with an on-board crystal driver). I indeed want less than 2% error, since I use the CH330N and CH340C in most projects where I need a USB to serial adapter. My favorite at the moment is the CH330N, because of the small SOIC-8 package.

Here's the output from the script when the error is allowed to be 2% or less. But I remember I read somewhere that @nerdralph's code was able to use 460800 baud when running at 16 MHz? This script calculates this error to be 13.02%. Is this correct?

EDIT: Just tested 460800 baud @ 16 MHz. It works like a charm! I believe the calculations aren't 100% correct. 13.02% error would only print garbage.

EDIT2: I used my oscilloscope to measure the bit length (sending a 0x20 character every second), and manually calculated the error. 460800 baud @ 16 MHz results in an error of -1.98%, or a bit length of 2.128us instead of 2.1701us.

|AVR_FREQ|BAUD_RATE|BAUD_ACTUAL    |ERROR %            |
|--------|---------|---------------|-------------------|
|20000000|57600    |58271.285205568|-1.1654257041114   |
|20000000|38400    |38697.194453402|-0.77394388906803  |
|20000000|19200    |19274.012206874|-0.38548024413747  |
|20000000|9600     |9618.4674575184|-0.19236934915036  |
|20000000|2400     |2401.1525532255|-0.048023051064509 |
|20000000|1200     |1200.2880691366|-0.02400576138272  |
|20000000|300      |300.01800108006|-0.0060003600215926|
|16000000|57600    |58441.558441558|-1.461038961039    |
|16000000|38400    |38772.213247173|-0.96930533117931  |
|16000000|19200    |19292.604501608|-0.48231511254018  |
|16000000|9600     |9623.0954290297|-0.24057738572573  |
|16000000|2400     |2401.4408645187|-0.060036021612954 |
|16000000|1200     |1200.3601080324|-0.030009002700808 |
|16000000|300      |300.02250168763|-0.0075005625421909|
|12000000|57600    |58727.569331158|-1.9575856443719   |
|12000000|38400    |38897.893030794|-1.2965964343598   |
|12000000|19200    |19323.671497585|-0.64412238325283  |
|12000000|9600     |9630.8186195827|-0.32102728731941  |
|12000000|2400     |2401.9215372298|-0.080064051241    |
|12000000|1200     |1200.4801920768|-0.040016006402558 |
|12000000|300      |300.0300030003 |-0.010001000100014 |
|9600000 |38400    |39024.390243902|-1.6260162601626   |
|9600000 |19200    |19354.838709677|-0.80645161290323  |
|9600000 |9600     |9638.5542168675|-0.4016064257028   |
|9600000 |2400     |2402.4024024024|-0.1001001001001   |
|9600000 |1200     |1200.6003001501|-0.050025012506258 |
|9600000 |300      |300.03750468809|-0.012501562695339 |
|8000000 |38400    |39151.712887439|-1.9575856443719   |
|8000000 |19200    |19386.106623586|-0.96930533117931  |
|8000000 |9600     |9646.3022508039|-0.48231511254018  |
|8000000 |2400     |2402.8834601522|-0.12014417300761  |
|8000000 |1200     |1200.7204322594|-0.060036021612954 |
|8000000 |300      |300.04500675101|-0.015002250337545 |
|4800000 |19200    |19512.195121951|-1.6260162601626   |
|4800000 |9600     |9677.4193548387|-0.80645161290323  |
|4800000 |2400     |2404.8096192385|-0.2004008016032   |
|4800000 |1200     |1201.2012012012|-0.1001001001001   |
|4800000 |300      |300.07501875469|-0.0250062515629   |
|1200000 |2400     |2419.3548387097|-0.80645161290323  |
|1200000 |1200     |1204.8192771084|-0.4016064257028   |
|1200000 |300      |300.3003003003 |-0.1001001001001   |
|1000000 |2400     |2423.2633279483|-0.96930533117931  |
|1000000 |1200     |1205.7877813505|-0.48231511254018  |
|1000000 |300      |300.36043251902|-0.12014417300761  |
|600000  |2400     |2439.0243902439|-1.6260162601626   |
|600000  |1200     |1209.6774193548|-0.80645161290323  |
|600000  |300      |300.60120240481|-0.2004008016032   |
|128000  |300      |302.83911671924|-0.94637223974763  |

sleemanj commented 4 years ago

Sounds like the calculation isn't entirley on the money then.

avr-libc includes similar calculation (probably where I derived this from long ago) in setbaud.h, you should have a copy on your system somewhere but pasted below, it's just some calculations done in macros, see BAUD_TOL as the tolerance.

/* Copyright (c) 2007  Cliff Lawson
  Copyright (c) 2007  Carlos Lamas
  All rights reserved.

  Redistribution and use in source and binary forms, with or without
  modification, are permitted provided that the following conditions are met:

  * Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.

  * Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in
    the documentation and/or other materials provided with the
    distribution.

  * Neither the name of the copyright holders nor the names of
    contributors may be used to endorse or promote products derived
    from this software without specific prior written permission.

  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  POSSIBILITY OF SUCH DAMAGE. */

/* $Id$ */

/**
  \file
*/

/**
  \defgroup util_setbaud <util/setbaud.h>: Helper macros for baud rate calculations
  \code
  #define F_CPU 11059200
  #define BAUD 38400
  #include <util/setbaud.h>
  \endcode

  This header file requires that on entry values are already defined
  for F_CPU and BAUD.  In addition, the macro BAUD_TOL will define
  the baud rate tolerance (in percent) that is acceptable during
  the calculations.  The value of BAUD_TOL will default to 2 %.

  This header file defines macros suitable to setup the UART baud
  rate prescaler registers of an AVR.  All calculations are done
  using the C preprocessor.  Including this header file causes no
  other side effects so it is possible to include this file more than
  once (supposedly, with different values for the BAUD parameter),
  possibly even within the same function.

  Assuming that the requested BAUD is valid for the given F_CPU then
  the macro UBRR_VALUE is set to the required prescaler value.  Two
  additional macros are provided for the low and high bytes of the
  prescaler, respectively: UBRRL_VALUE is set to the lower byte of
  the UBRR_VALUE and UBRRH_VALUE is set to the upper byte.  An
  additional macro USE_2X will be defined.  Its value is set to 1 if
  the desired BAUD rate within the given tolerance could only be
  achieved by setting the U2X bit in the UART configuration.  It will
  be defined to 0 if U2X is not needed.

  Example usage:

  \code
  #include <avr/io.h>

  #define F_CPU 4000000

  static void
  uart_9600(void)
  {
  #define BAUD 9600
  #include <util/setbaud.h>
  UBRRH = UBRRH_VALUE;
  UBRRL = UBRRL_VALUE;
  #if USE_2X
  UCSRA |= (1 << U2X);
  #else
  UCSRA &= ~(1 << U2X);
  #endif
  }

  static void
  uart_38400(void)
  {
  #undef BAUD  // avoid compiler warning
  #define BAUD 38400
  #include <util/setbaud.h>
  UBRRH = UBRRH_VALUE;
  UBRRL = UBRRL_VALUE;
  #if USE_2X
  UCSRA |= (1 << U2X);
  #else
  UCSRA &= ~(1 << U2X);
  #endif
  }
  \endcode

  In this example, two functions are defined to setup the UART
  to run at 9600 Bd, and 38400 Bd, respectively.  Using a CPU
  clock of 4 MHz, 9600 Bd can be achieved with an acceptable
  tolerance without setting U2X (prescaler 25), while 38400 Bd
  require U2X to be set (prescaler 12).
*/

#ifndef F_CPU
#  error "setbaud.h requires F_CPU to be defined"
#endif

#ifndef BAUD
#  error "setbaud.h requires BAUD to be defined"
#endif

#if !(F_CPU)
#  error "F_CPU must be a constant value"
#endif

#if !(BAUD)
#  error "BAUD must be a constant value"
#endif

#if defined(__DOXYGEN__)
/**
  \def BAUD_TOL
  \ingroup util_setbaud

  Input and output macro for <util/setbaud.h>

  Define the acceptable baud rate tolerance in percent.  If not set
  on entry, it will be set to its default value of 2.
*/
#define BAUD_TOL 2

/**
  \def UBRR_VALUE
  \ingroup util_setbaud

  Output macro from <util/setbaud.h>

  Contains the calculated baud rate prescaler value for the UBRR
  register.
*/
#define UBRR_VALUE

/**
  \def UBRRL_VALUE
  \ingroup util_setbaud

  Output macro from <util/setbaud.h>

  Contains the lower byte of the calculated prescaler value
  (UBRR_VALUE).
*/
#define UBRRL_VALUE

/**
  \def UBRRH_VALUE
  \ingroup util_setbaud

  Output macro from <util/setbaud.h>

  Contains the upper byte of the calculated prescaler value
  (UBRR_VALUE).
*/
#define UBRRH_VALUE

/**
  \def USE_2X
  \ingroup util_setbaud

  Output macro from <util/setbaud.h>

  Contains the value 1 if the desired baud rate tolerance could only
  be achieved by setting the U2X bit in the UART configuration.
  Contains 0 otherwise.
*/
#define USE_2X 0

#else /* !__DOXYGEN__ */

#undef USE_2X

/* Baud rate tolerance is 2 % unless previously defined */
#ifndef BAUD_TOL
#  define BAUD_TOL 2
#endif

#ifdef __ASSEMBLER__
#define UBRR_VALUE (((F_CPU) + 8 * (BAUD)) / (16 * (BAUD)) -1)
#else
#define UBRR_VALUE (((F_CPU) + 8UL * (BAUD)) / (16UL * (BAUD)) -1UL)
#endif

#if 100 * (F_CPU) > \
  (16 * ((UBRR_VALUE) + 1)) * (100 * (BAUD) + (BAUD) * (BAUD_TOL))
#  define USE_2X 1
#elif 100 * (F_CPU) < \
  (16 * ((UBRR_VALUE) + 1)) * (100 * (BAUD) - (BAUD) * (BAUD_TOL))
#  define USE_2X 1
#else
#  define USE_2X 0
#endif

#if USE_2X
/* U2X required, recalculate */
#undef UBRR_VALUE

#ifdef __ASSEMBLER__
#define UBRR_VALUE (((F_CPU) + 4 * (BAUD)) / (8 * (BAUD)) -1)
#else
#define UBRR_VALUE (((F_CPU) + 4UL * (BAUD)) / (8UL * (BAUD)) -1UL)
#endif

#if 100 * (F_CPU) > \
  (8 * ((UBRR_VALUE) + 1)) * (100 * (BAUD) + (BAUD) * (BAUD_TOL))
#  warning "Baud rate achieved is higher than allowed"
#endif

#if 100 * (F_CPU) < \
  (8 * ((UBRR_VALUE) + 1)) * (100 * (BAUD) - (BAUD) * (BAUD_TOL))
#  warning "Baud rate achieved is lower than allowed"
#endif

#endif /* USE_U2X */

#ifdef UBRR_VALUE
  /* Check for overflow */
#  if UBRR_VALUE >= (1 << 12)
#    warning "UBRR value overflow"
#  endif

#  define UBRRL_VALUE (UBRR_VALUE & 0xff)
#  define UBRRH_VALUE (UBRR_VALUE >> 8)
#endif

#endif /* __DOXYGEN__ */
/* end of util/setbaud.h */

MCUdude commented 4 years ago

After some cursor readouts on my scope, I came up with a crude error formula that's at least in the ballpark.

(( 100 * ( BAUD_RATE - F_CPU / ( 8 * (( (F_CPU + BAUD_RATE * 7,33920732659) / ((BAUD_RATE * 8))) - 1 ) )) ) / BAUD_RATE)

I calculated the long decimal number based on a few readouts. This means this formula is by no means perfect, but I think it's good enough to determine if we're OK or way out. I don't know the code well enough to provide a "correct" one.

MCUdude commented 4 years ago

Here's the calculated result is acceptable. I can confirm that 115200 baud is working fine when using the internal 4.8 MHz oscillator where it has been tuned by using @sleemanj's OSCCAL sketch.

@nerdralph is it wise to use the highest possible baud rate whenever possible to use as little CPU time as possible? I was thinking about using these default values:

EDIT: 19200 baud is causing the lto wrapper to crash for some reason.

Clock	Default baud rate
20 MHz	115200
16 MHz	115200
12 MHz	115200
9.6 MHz	115200
8 MHz	115200
4.8 MHz	115200
1.2 MHz	~~19200~~ 9600
1 MHz	~19200~ 9600
600 kHz	~19200~ 9600
128 kHz	Not supported (inaccurate, slow, no OSCCAL)

Calculated error (only <2.3% shown)

F_CPU	Baud rate	% Error
20000000	460800	-1,54600371
20000000	250000	-0,83287027
20000000	230400	-0,76707237
20000000	115200	-0,3820708
20000000	57600	-0,19067115
20000000	38400	-0,12703336
20000000	19200	-0,06347636
20000000	9600	-0,03172811
20000000	4800	-0,01586154
20000000	2400	-0,00793014
20000000	1200	-0,00396491
20000000	600	-0,00198242
20000000	300	-0,0009912
16000000	460800	-1,94000276
16000000	250000	-1,04326009
16000000	230400	-0,96068274
16000000	115200	-0,47804512
16000000	57600	-0,23845261
16000000	38400	-0,15884215
16000000	19200	-0,07935805
16000000	9600	-0,03966329
16000000	4800	-0,01982771
16000000	2400	-0,00991287
16000000	1200	-0,00495619
16000000	600	-0,00247803
16000000	300	-0,001239
12000000	250000	-1,39586763
12000000	230400	-1,28502533
12000000	115200	-0,6384108
12000000	57600	-0,31818972
12000000	38400	-0,21190173
12000000	19200	-0,10583873
12000000	9600	-0,05289137
12000000	4800	-0,0264387
12000000	2400	-0,0132176
12000000	1200	-0,00660836
12000000	600	-0,00330407
12000000	300	-0,00165201
9600000	250000	-1,75094476
9600000	230400	-1,61145858
9600000	115200	-0,79928918
9600000	57600	-0,39805379
9600000	38400	-0,26501756
9600000	19200	-0,13233342
9600000	9600	-0,06612296
9600000	4800	-0,03305055
9600000	2400	-0,01652255
9600000	1200	-0,00826059
9600000	600	-0,00413012
9600000	300	-0,00206502
8000000	250000	-2,10851751
8000000	230400	-1,94000276
8000000	115200	-0,96068274
8000000	57600	-0,47804512
8000000	38400	-0,31818972
8000000	19200	-0,15884215
8000000	9600	-0,07935805
8000000	4800	-0,03966329
8000000	2400	-0,01982771
8000000	1200	-0,00991287
8000000	600	-0,00495619
8000000	300	-0,00247803
4800000	115200	-1,61145858
4800000	57600	-0,79928918
4800000	38400	-0,53144353
4800000	19200	-0,26501756
4800000	9600	-0,13233342
4800000	4800	-0,06612296
4800000	2400	-0,03305055
4800000	1200	-0,01652255
4800000	600	-0,00826059
4800000	300	-0,00413012
1200000	38400	-2,16021509
1200000	19200	-1,06856589
1200000	9600	-0,53144353
1200000	4800	-0,26501756
1200000	2400	-0,13233342
1200000	1200	-0,06612296
1200000	600	-0,03305055
1200000	300	-0,01652255
600000	19200	-2,16021509
600000	9600	-1,06856589
600000	4800	-0,53144353
600000	2400	-0,26501756
600000	1200	-0,13233342
600000	600	-0,06612296
600000	300	-0,03305055
128000	2400	-1,25452971
128000	1200	-0,62335477
128000	600	-0,31070898
128000	300	-0,15511351

nerdralph commented 4 years ago

The calculations for the baud rate error of the hardware USART is irrelevant. As I mentioned before, my uart code is accurate to within +-1.5 cycles. The number of cycles per bit is 7 + 3*TXDELAY, where TXDELAY is calculated by the header file macros based on F_CPU and BAUD_RATE. At 115.2kbps, the ideal time for each bit is 8.681uS. With a 4.8Mhz clock, that's 41.67 cycles. The macros will calculate the best TXDELAY of 12, for a delay per bit of 43 cycles. In this instance the uart will be slow by 43/41.67 or 3.19%. Adding a 1% variation for a reasonably-tuned OSCCAL, the total error is less than the 5% margin required for 8N1. I'd stick with 115.2 for 4.8Mhz and up. For 1.2Mhz and lower, I'd go with a default of 38,400bps.

MCUdude commented 4 years ago

The calculations for the baud rate error of the hardware USART is irrelevant.

My conclusion too after spending some time testing on actual hardware.

It turned that some of the lower clock speed supported a higher baud rate that I had initially thought. Here's an "updated" default baudrates table:

Clock	Default baud rate
20 MHz	115200
16 MHz	115200
12 MHz	115200
9.6 MHz	115200
8 MHz	115200
4.8 MHz	115200
1.2 MHz	38400
1 MHz	38400
600 kHz	~19200~ 9600
128 kHz	Not supported (inaccurate, slow, no OSCCAL)

For some strange reason, I'm not allowed to use 19200 baud for ANY F_CPU. I'm just getting this error:

In file included from C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\Arduino.h:119:0,

                 from C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.cpp:24:

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h: In function 'void dummy()':

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:64:48: warning: integer overflow in expression [-Woverflow]

 #define RXSTART_CYCLES DIVIDE_ROUNDED(3*F_CPU,2*BAUD_RATE)

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:58:51: note: in definition of macro 'DIVIDE_ROUNDED'

 #define DIVIDE_ROUNDED(NUMERATOR, DIVISOR) ((((2*(NUMERATOR))/(DIVISOR))+1)/2)

                                                   ^~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:64:24: note: in expansion of macro 'DIVIDE_ROUNDED'

 #define RXSTART_CYCLES DIVIDE_ROUNDED(3*F_CPU,2*BAUD_RATE)

                        ^~~~~~~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:66:37: note: in expansion of macro 'RXSTART_CYCLES'

 #define RXSTARTCOUNT DIVIDE_ROUNDED(RXSTART_CYCLES - 13, 3)

                                     ^~~~~~~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:108:23: note: in expansion of macro 'RXSTARTCOUNT'

     ::[rxscount] "M" (RXSTARTCOUNT)

                       ^~~~~~~~~~~~

In file included from C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\Arduino.h:119:0,

                 from C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\main.cpp:12:

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h: In function 'void dummy()':

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:64:48: warning: integer overflow in expression [-Woverflow]

 #define RXSTART_CYCLES DIVIDE_ROUNDED(3*F_CPU,2*BAUD_RATE)

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:58:51: note: in definition of macro 'DIVIDE_ROUNDED'

 #define DIVIDE_ROUNDED(NUMERATOR, DIVISOR) ((((2*(NUMERATOR))/(DIVISOR))+1)/2)

                                                   ^~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:64:24: note: in expansion of macro 'DIVIDE_ROUNDED'

 #define RXSTART_CYCLES DIVIDE_ROUNDED(3*F_CPU,2*BAUD_RATE)

                        ^~~~~~~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:66:37: note: in expansion of macro 'RXSTART_CYCLES'

 #define RXSTARTCOUNT DIVIDE_ROUNDED(RXSTART_CYCLES - 13, 3)

                                     ^~~~~~~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:108:23: note: in expansion of macro 'RXSTARTCOUNT'

     ::[rxscount] "M" (RXSTARTCOUNT)

                       ^~~~~~~~~~~~

In file included from C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\Arduino.h:119:0,

                 from C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\Print.cpp:30:

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h: In function 'void dummy()':

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:64:48: warning: integer overflow in expression [-Woverflow]

 #define RXSTART_CYCLES DIVIDE_ROUNDED(3*F_CPU,2*BAUD_RATE)

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:58:51: note: in definition of macro 'DIVIDE_ROUNDED'

 #define DIVIDE_ROUNDED(NUMERATOR, DIVISOR) ((((2*(NUMERATOR))/(DIVISOR))+1)/2)

                                                   ^~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:64:24: note: in expansion of macro 'DIVIDE_ROUNDED'

 #define RXSTART_CYCLES DIVIDE_ROUNDED(3*F_CPU,2*BAUD_RATE)

                        ^~~~~~~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:66:37: note: in expansion of macro 'RXSTART_CYCLES'

 #define RXSTARTCOUNT DIVIDE_ROUNDED(RXSTART_CYCLES - 13, 3)

                                     ^~~~~~~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:108:23: note: in expansion of macro 'RXSTARTCOUNT'

     ::[rxscount] "M" (RXSTARTCOUNT)

                       ^~~~~~~~~~~~

In file included from C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\Arduino.h:119:0,

                 from C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\Tone.cpp:25:

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h: In function 'void dummy()':

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:64:48: warning: integer overflow in expression [-Woverflow]

 #define RXSTART_CYCLES DIVIDE_ROUNDED(3*F_CPU,2*BAUD_RATE)

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:58:51: note: in definition of macro 'DIVIDE_ROUNDED'

 #define DIVIDE_ROUNDED(NUMERATOR, DIVISOR) ((((2*(NUMERATOR))/(DIVISOR))+1)/2)

                                                   ^~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:64:24: note: in expansion of macro 'DIVIDE_ROUNDED'

 #define RXSTART_CYCLES DIVIDE_ROUNDED(3*F_CPU,2*BAUD_RATE)

                        ^~~~~~~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:66:37: note: in expansion of macro 'RXSTART_CYCLES'

 #define RXSTARTCOUNT DIVIDE_ROUNDED(RXSTART_CYCLES - 13, 3)

                                     ^~~~~~~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:108:23: note: in expansion of macro 'RXSTARTCOUNT'

     ::[rxscount] "M" (RXSTARTCOUNT)

                       ^~~~~~~~~~~~

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h: In function 'dummy':

C:\Users\h.bull.LAUDM\Documents\Arduino\hardware\MicroCore\avr\cores\microcore\HalfDuplexSerial.h:109:6: error: impossible constraint in 'asm'

     );

      ^

lto-wrapper.exe: fatal error: C:\Users\h.bull.LAUDM\AppData\Local\Arduino15\packages\arduino\tools\avr-gcc\7.3.0-atmel3.6.1-arduino5/bin/avr-gcc returned 1 exit status

compilation terminated.

c:/users/h.bull.laudm/appdata/local/arduino15/packages/arduino/tools/avr-gcc/7.3.0-atmel3.6.1-arduino5/bin/../lib/gcc/avr/7.3.0/../../../../avr/bin/ld.exe: error: lto-wrapper failed

collect2.exe: error: ld returned 1 exit status

exit status 1
Error compiling for board ATtiny13.

nerdralph commented 4 years ago

Those macros were a pain to write, and as you probably can guess, even more annoying to debug. I'll take a quick look but won't spend much time on it. In the 5+ years since I wrote them, avr-gcc with LTO has significantly changed/improved, and so I'll take a shot at rewriting the baud rate calculation code so that it doesn't use macros any more.

MCUdude commented 4 years ago

Those macros were a pain to write, and as you probably can guess, even more, annoying to debug. I'll take a quick look but won't spend much time on it.

I can only imagine! It's not a big deal at all. I'll simply specify that 19200 baud isn't supported. However, If you do sort it out I'll, of course, add the fix to this repo.

The current code is available in the HalfDuplexSerial branch if you need something to test on. The baud rate can be overridden by modifying the core_settings.h file.

I'll later also add an OSCCAL sketch so the internal oscillator can be used. The T13 I have on my desk is pretty terrible. The default OSCCAL value was 92 (decimal), but for the 4.8 MHz oscillator I had to adjust it all the way to 105 in order to be as accurate as it can be based on the OSCCAL resolution. For the 9.6 MHz, one a value of 98 resulted in the most accurate clock.

nerdralph commented 4 years ago

I figured out the error with 19200bps. gcc implicitly determines the type of BAUD_RATE based on the value. 9600 and 19200 are int16, while 38400 and above are int32. The RXSTART macro multiplies the baud rate by 2 (to start reading half-way through the bit). 2 * 19200 as an unsigned 16-bit integer results in an integer overflow as indicated in the warning. Setting BAUD_RATE to 19200L tells GCC the type is Long (32-bit), which will not overflow.

nerdralph commented 4 years ago

BTW, I've done a LOT of experimenting with OSCCAL and accuracy/precision. The parts I've bought from Newark (t85, t84a, t88) in the past 7 years have all been within 1% at 3.3 & 5v. The t13's I've bought off Aliexpress are the only parts that have been way out. They are still within the +-10% datasheet spec, so my guess is these are b-grade parts specifically for the Chinese market. For low-end chips like the t13, packaging and testing can cost more than the die. Skipping OSCCAL calibration on the parts would save time/money. You can actually duplicate the factory OSCCAL modification by using the undocumented signature page write programming command. I found an old thread in avrfreaks where someone figured out the HVSP command, and I was able to verify that it works with a HVSP programmer that I built. http://nerdralph.blogspot.com/2018/05/piggyfuse-hvsp-avr-fuse-programmer.html It's also supposedly possible to program it with the standard ICSP. I only know how to erase the signature page through ICSP, and didn't figure out how to program it. Maybe one day...

I've also been thinking about ways of automatically detecting the target speed at flash programming time. Last summer I got speed detection working with DebugWire. https://github.com/nerdralph/nerdralph/blob/master/autobaud.py I think it may also possible to detect the target speed during ICSP by timing the delay between SCK transitions and MISO.

With a modified programmer, the target speed could be reported back to the host using avrdude extended parameters. The IDE could then use an option like the tools get board info to read the signature, fuses, calibration bytes, and RC oscillator speed.

MCUdude commented 4 years ago

I figured out the error with 19200bps. gcc implicitly determines the type of BAUD_RATE based on the value. 9600 and 19200 are int16, while 38400 and above are int32.

Great! Adding L at the end did the trick. BTW is there a way of making the preprocessor adding the L afterward, so that the user doesn't have to do it? Again, not a big deal, but still a nice touch.

BTW, I've done a LOT of experimenting with OSCCAL and accuracy/precision. The parts I've bought from Newark (t85, t84a, t88) in the past 7 years have all been within 1% at 3.3 & 5v. The t13's I've bought off Aliexpress are the only parts that have been way out. They are still within the +-10% datasheet spec, so my guess is these are b-grade parts specifically for the Chinese market. For low-end chips like the t13, packaging and testing can cost more than the die. Skipping OSCCAL calibration on the parts would save time/money. You can actually duplicate the factory OSCCAL modification by using the undocumented signature page write programming command. I found an old thread in avrfreaks where someone figured out the HVSP command, and I was able to verify that it works with a HVSP programmer that I built. http://nerdralph.blogspot.com/2018/05/piggyfuse-hvsp-avr-fuse-programmer.html It's also supposedly possible to program it with the standard ICSP. I only know how to erase the signature page through ICSP, and didn't figure out how to program it. Maybe one day...

Interesting read! But what do we achieve by modifying the factory OSCCAL value? Will the "custom" OSCCAL values be loaded on boot without having to manually do it using EEPROM storage?

I've also been thinking about ways of automatically detecting the target speed at flash programming time. Last summer I got speed detection working with DebugWire. https://github.com/nerdralph/nerdralph/blob/master/autobaud.py I think it may also possible to detect the target speed during ICSP by timing the delay between SCK transitions and MISO.

With a modified programmer, the target speed could be reported back to the host using avrdude extended parameters. The IDE could then use an option like the tools get board info to read the signature, fuses, calibration bytes, and RC oscillator speed.

So, in theory, it could be possible for the programmer to tweak the OSCCAL value to get the clock as accurate as possible without using the method described in AVR053?

sleemanj commented 4 years ago

the total error is less than the 5% margin required for 8N

Correct me if I'm wrong, but that's total 5% error, if you have one end 3% slow and the other end 3% fast you're going to have a bad day.

nerdralph commented 4 years ago

I figured out the error with 19200bps. gcc implicitly determines the type of BAUD_RATE based on the value. 9600 and 19200 are int16, while 38400 and above are int32.

Great! Adding L at the end did the trick. BTW is there a way of making the preprocessor adding the L afterward, so that the user doesn't have to do it? Again, not a big deal, but still a nice touch.

OK, after a bunch of fighting with the preprocessor, I came up with a simple solution. Mulitplying by a long (2L) guarantees the BAUD_RATE always gets promoted to a long. I pushed the change to BBUart.h in my github repo.

http://nerdralph.blogspot.com/2018/05/piggyfuse-hvsp-avr-fuse-programmer.html It's also supposedly possible to program it with the standard ICSP. I only know how to erase the signature page through ICSP, and didn't figure out how to program it. Maybe one day...

Interesting read! But what do we achieve by modifying the factory OSCCAL value? Will the "custom" OSCCAL values be loaded on boot without having to manually do it using EEPROM storage? Yes, that's the point. Part of the power-up sequence is loading the factory-set OSSCAL value from the signature page in flash into the OSCCAL register. By re-writing the signature page with a new OSCCAL value, the new value is what gets loaded at reset.

With a modified programmer, the target speed could be reported back to the host using avrdude extended parameters. The IDE could then use an option like the tools get board info to read the signature, fuses, calibration bytes, and RC oscillator speed.

So, in theory, it could be possible for the programmer to tweak the OSCCAL value to get the clock as accurate as possible without using the method described in AVR053?

Yes. I've already tested the concept using Makefiles and DebugWire with a Pl2303HX for a programmer. One makefile rule detects the target type and clock rate, creating a make.defs with the target device and F_CPU. Some ideas I've thought about that could integrate with the Arduino IDE are a custom USBasp firmware, or a custom programmer that is STK500 compatible which the host communicates with using a standard USB-TTL adapter. Realistically I'll probably stick to something that works from the command line and Makefiles, since trying to support serious AVR development with the Arduino IDE feels like trying to tune up a Yugo for the racetrack.

I really like DebugWire, particularly on small parts like the t13. You get a half-duplex uart for free - no flash used on the target. I also like having PB0-PB4 completely free. When developing with a USBasp connected, I'm limited in what I can use PB0-PB2 for, else I risk interfering with ICSP.

nerdralph commented 4 years ago

the total error is less than the 5% margin required for 8N

Correct me if I'm wrong, but that's total 5% error, if you have one end 3% slow and the other end 3% fast you're going to have a bad day.

If your USB-TTL adapters are only 3% accurate, you've been having a LOT of bad days. Even the cheapest oscillators are spec'd to 50ppm over their full temperature range. Supposing you get some real junk parts that leave out the caps on the oscillator, you're still within 100ppm. I bought a bunch of PL2303HX adapters for <50c, and they were all within 10-20ppm at 18-23C.

Since the TTL adapters are a reliable timing source, that's why you'll see them used for RC oscillator tuning. The target can time the frame from the host and use that to determine the internal oscillator frequency.

MCUdude commented 4 years ago

OK, after a bunch of fighting with the preprocessor, I came up with a simple solution. Multiplying by a long (2L) guarantees the BAUD_RATE always gets promoted to a long. I pushed the change to BBUart.h in my github repo.

Brilliant, that did the trick!

Since the TTL adapters are a reliable timing source, that's why you'll see them used for RC oscillator tuning. The target can time the frame from the host and use that to determine the internal oscillator frequency.

Does this mean in theory it should be possible for the T13 to output a known character and the host computer could calculate it's main clock frequency and perhaps how many OSCCAL steps needed in positive or negative direction? If so it should be possible to create a small shell script for easy calibration.

nerdralph commented 4 years ago

Since the TTL adapters are a reliable timing source, that's why you'll see them used for RC oscillator tuning. The target can time the frame from the host and use that to determine the internal oscillator frequency.

Does this mean in theory it should be possible for the T13 to output a known character and the host computer could calculate it's main clock frequency and perhaps how many OSCCAL steps needed in positive or negative direction? If so it should be possible to create a small shell script for easy calibration.

In theory yes, but in practice it's easier to go the other way with the host sending a known frame and the target timing and adjusting to it. The reason is a TTL adapter can't give you precise timing information about an incoming frame - you get a character that is the closest match. To get reliable timing, you have to send dozens or hundreds of pulses from the MCU, and have the host calculate the average time between the frames. Here's an example of this technique used to get very precise timing measurements when done over a period of hours: http://n1.taur.dk/nft/nft.pdf The timing program mentioned in that paper worked well enough for me to figure out that the oscillator in my 1054Z was slow by ~3ppm. http://nerdralph.blogspot.com/2015/07/rigol-ds1054z-frequency-counter-accuracy.html

Give me a day or two, and I'll write a basic calibration sketch to tune OSCCAL. It'll wait for a null (CTRL-@) from the host at 38,400bps, adjust OSCCAL based on the timing difference from ideal, and print the old and new OSSCAL values. After a few nulls from the host it will get to +-1 of the optimal OSCCAL value.

MCUdude commented 4 years ago

Give me a day or two, and I'll write a basic calibration sketch to tune OSCCAL. It'll wait for a null (CTRL-@) from the host at 38,400bps, adjust OSCCAL based on the timing difference from ideal, and print the old and new OSSCAL values. After a few nulls from the host it will get to +-1 of the optimal OSCCAL value.

Awesome, looking forward to test it! But how do I send a null character from a the Arduino serial monitor? It would be great if the user didn't have to install a third party serial monitor in order to calibrate.

MCUdude commented 4 years ago

@sleemanj I'd like to provide a few example sketches for the users to test out the serial functionality. Printing is all fine, but reading incoming data seems to be a bit more difficult than on a regular Arduino, and need some good examples.

Since we don't have an RX buffer and the Serial.read() function is non-blocking, how can we even receive data without using Serial.read_char_blocking()?

I'd like to provide a simple echo program that just prints back whatever the user typed in the serial monitor. Can this be done without blocking receive code?

sleemanj commented 4 years ago

For tuning, while it is not gonna fit in a t13 here is my cut down version of "tinytuner" which I burn into optiboot images...

https://github.com/sleemanj/optiboot/blob/master/optiboot/bootloaders/optiboot/veryTinyTuner.c

you just repeatedly send "x". As I say, totally unsuited to T13, but maybe gives some ideas or copy-paste.

sleemanj commented 4 years ago

I'd like to provide a simple echo program that just prints back whatever the user typed in the serial monitor. Can this be done without blocking receive code?

https://github.com/sleemanj/ATTinyCore/blob/master/avr/libraries/ATTinyCore/examples/Tiny13/04.Communication/ReadASCIIString/ReadASCIIString.ino

?

MCUdude commented 4 years ago

https://github.com/sleemanj/ATTinyCore/blob/master/avr/libraries/ATTinyCore/examples/Tiny13/04.Communication/ReadASCIIString/ReadASCIIString.ino ?

Perfect, exactly what I was looking for! Is it OK for you if I borrow some of your examples and tweak them a bit?

sleemanj commented 4 years ago

Sure thing, grab whatever you want :)

MCUdude commented 4 years ago

@nerdralph I'm experiencing some issues when the ATtiny13 is reading data from the PC at high speeds. For instance, I'm able to write to the PC with 115200 baud @ 4.8 MHz, but I'm not able to read. I'm using James' ReadASCIIString sketch.

https://github.com/sleemanj/ATTinyCore/blob/master/avr/libraries/ATTinyCore/examples/Tiny13/04.Communication/ReadASCIIString/ReadASCIIString.ino

Is this normal behaviour?

nerdralph commented 4 years ago

For tuning, while it is not gonna fit in a t13 here is my cut down version of "tinytuner" which I burn into optiboot images...

https://github.com/sleemanj/optiboot/blob/master/optiboot/bootloaders/optiboot/veryTinyTuner.c

you just repeatedly send "x". As I say, totally unsuited to T13, but maybe gives some ideas or copy-paste.

I got a working tuner for the t13 going last night. 310 bytes of flash I was using 'p' (sticking to standard ASCII so it works in the Arduino serial monitor), but will probably switch to 'x' to make it work the same as yours. I also need to make it ignore noise from connecting the serial.

As I mentioned before, the cheap PL2303HX adapters I'm using are very tolerant of timing errors. Even starting at ~4.5 slow (measured on a scope), I get no errors sending from the t13.

nerdralph commented 4 years ago

@nerdralph I'm experiencing some issues when the ATtiny13 is reading data from the PC at high speeds. For instance, I'm able to write to the PC with 115200 baud @ 4.8 MHz, but I'm not able to read. I'm using James' ReadASCIIString sketch.

https://github.com/sleemanj/ATTinyCore/blob/master/avr/libraries/ATTinyCore/examples/Tiny13/04.Communication/ReadASCIIString/ReadASCIIString.ino

Is this normal behaviour?

I'm not too surprised. A couple days ago I was looking at James' mods to RxByte, and noticed they skew the timing of when the received bit is read. I emailed him some suggestions for reducing the skew.

A couple years ago I worked on converting RxByte to be interrupt driven, with a single-character buffer. In addition to making the Rx timing more consistent, it would also allow for implementing Serial.available(). What do you think about doing it that way?

MCUdude commented 4 years ago

I'm not too surprised. A couple days ago I was looking at James' mods to RxByte, and noticed they skew the timing of when the received bit is read. I emailed him some suggestions for reducing the skew.

So reducing the skew fixed this issue? If so I'd be interested to see if it works on my hardware.

A couple years ago I worked on converting RxByte to be interrupt-driven, with a single-character buffer. In addition to making the Rx timing more consistent, it would also allow for implementing Serial.available(). What do you think about doing it that way?

Interrupt driven RX with a single byte buffer sounds very interesting, especially if we could blend this into James' Arduino wrapper so that Serial.available() would work as expected. Would it be best to use the standard INT0 and not PCINT? After all, I've more or less specified that PB1 is the Rx pin, period. And how about RAM and flash usage? Will it be much worse you think?

nerdralph commented 4 years ago

I'm not too surprised. A couple days ago I was looking at James' mods to RxByte, and noticed they skew the timing of when the received bit is read. I emailed him some suggestions for reducing the skew.

So reducing the skew fixed this issue? If so I'd be interested to see if it works on my hardware.

I identified the problem from reviewing the code, not from testing.

A couple years ago I worked on converting RxByte to be interrupt-driven, with a single-character buffer. In addition to making the Rx timing more consistent, it would also allow for implementing Serial.available(). What do you think about doing it that way?

Interrupt driven RX with a single byte buffer sounds very interesting, especially if we could blend this into James' Arduino wrapper so that Serial.available() would work as expected. Would it be best to use the standard INT0 and not PCINT? After all, I've more or less specified that PB1 is the Rx pin, period. And how about RAM and flash usage? Will it be much worse you think?

I've actually been thinking about changing the default Rx/Tx pins in BBUart.S. I picked those when the t84 was my preferred AVR, where using PB0/PB1 doesn't interfere with USI and PWM which are on PORTA. On the t13 I've been using PB3 & PB4, or just when doing single-wire Rx/Tx. That leaves PB0/PB1 still free for PWM, and keeps ICSP programming from injecting garbage into your uart. Using PB0/PB1 for the uart has similar issues on the tx5 parts too.

As for RAM and flash, I'm confident I can do it with just a single byte of RAM for the buffer. For extra flash I figure about 20 extra bytes. Even less than that if I have time to fully optimize how the Serial class interfaces with the send & receive asm functions.

MCUdude commented 4 years ago

I'm not too surprised. A couple days ago I was looking at James' mods to RxByte, and noticed they skew the timing of when the received bit is read. I emailed him some suggestions for reducing the skew. So reducing the skew fixed this issue? If so I'd be interested to see if it works on my hardware. I identified the problem from reviewing the code, not from testing.

Would you mind sharing your discovery so that I can test it? 🙂

I've actually been thinking about changing the default Rx/Tx pins in BBUart.S. I picked those when the t84 was my preferred AVR, where using PB0/PB1 doesn't interfere with USI and PWM which are on PORTA. On the t13 I've been using PB3 & PB4, or just when doing single-wire Rx/Tx. That leaves PB0/PB1 still free for PWM, and keeps ICSP programming from injecting garbage into your uart. Using PB0/PB1 for the uart has similar issues on the tx5 parts too.

Unfortunately, I can't do that with MicroCore. I've designed a development board that I'm planning to sell that uses PB0/PB1. The board is also designed to work with ATtiny25/48/85 using ATTinyCore. PB0/PB1 is used by the Optiboot bootloader on the T85 too. But it shouldn't be a problem to keep the "old" pin style? It would, however, on PB1 be nice if INT0 were used instead of PCINT. the INT0 pin is in use anyways, and this means we don't occupy any PCINT ISR either.

As for RAM and flash, I'm confident I can do it with just a single byte of RAM for the buffer. For extra flash I figure about 20 extra bytes. Even less than that if I have time to fully optimize how the Serial class interfaces with the send & receive asm functions.

Excellent, Looking forward to test it. However, to speed things up a bit I will probably do a release with the current, non-blocking receive function. Then I can play around with the new implementation without having to release it before it's 100% ready.

What's really left for the current implementation is to (maybe, if possible) sort out the "Rx bug" when using high baud rates. I also have a little documentation left.

nerdralph commented 4 years ago

Here's the details I emailed to James. I think there is a few more cycles of skew added by the non-blocking entry points to RxByte, so the changes I've suggested would just solve the blocking Rx skew.

The mods you made to RxByte will throw off the timing by a few cycles
because the extra instructions will increase the time between
detection of the start bit and sampling the first data bit.  There
should be only 2 cycles between GotStartBit: and RxBit:
Instead of using R16, use R19 to save SREG since R18-R27 don't need to
be saved by the called function.

Here's how you can rearrange the code:
RxByte:
  in r19, SREG; Save status register
  ldi r24, 0x80 ; bit shift counter
  sbic UART_Port-2, UART_Rx ; wait for start edge
  rjmp RxByte
GotStartBit:
  cli
  ldi delayArg, RXSTART
RxBit:

You'd need to save SREG and load r24 in your non-blocking entry points
as well.  Another option (probably the best one) would be to still
save SREG and load r24 after detecting the start bit, but to change
the RXSTART calculation to compensate for the extra 2 cycles.
#define RXSTARTCOUNT DIVIDE_ROUNDED(RXSTART_CYCLES - 15, 3)

As for the Tx/Rx pins, I'm guessing you don't feel like cutting the traces and adding jumper wires on all the boards you've fabbed? :-)

sleemanj commented 4 years ago

https://github.com/sleemanj/ATTinyCore/commit/5172add6ee7dc08432c51e3405c5424162924085#diff-3b5e294272863d2389815f566bc74960

I think ( untested ) the above commit should address the changes @nerdralph suggested.

MCUdude commented 4 years ago

I just tested the code on some actual hardware. I'm running at 1.2 MHz (using an external signal generator) and 38400 baud.

Here's the output (inserted string is 1234567890)

What is your name traveler?
Nice to meet you 1⸮⸮⸮⸮6789⸮

MCUdude commented 4 years ago

Wait, it does work with when LTO is disabled! It actually turns out that both the "old" and "new" code works when LTO is disabled.

EDIT: Here's some data. Both waveforms show the old implementation (r16 etc). The white one is with LTO disabled, the yellow one enabled. The string I'm sending is 123456. My computer "understands" the while waveform.

Sorry, It's getting late here.. I was probing on the TX output of the T13. When LTO is enabled the space between each character is a little bit less, but it does not affect the result at all. It is the read routine itself that is the problem. The screenshot below is therefore irrelevant in this case.

DS1Z_QuickPrint1

EDIT2: I tried to compare the "old" and "new" implementation with the scope when LTO was disabled. They look more or less identical, and again, both work when LTO is disabled.

For reference, I'm using a Siglent SDG2042X to generate the clock, and this is how my current setup looks like with a custom-developed AVR 8-pin board (that I might plan to sell in the future).

2019-12-15 23 10 05-1

sleemanj commented 4 years ago

Yeah, I didn't exactly intend for my wrapper to be used at "high" speeds ;-) Personally, if I could get a t13 mostly reliably working at 9600 I was happy. Given that I hard-coded 9600 as the rate for a 1.2Mhz chip, I may have even found that to be best experimentally ;-)

As an experiment, you could with the recent committed version S file

comment line 74 (in r19, SREG)
comment line 88 (`out SREG, r19)

and in h file put line 78 back to use 13 instead of 15

this of course has the effect of not disabling interrupts in the assembly language read procedure, but since you are using read_str it already disables interrupts there anyway (for the entire read of a string).

That would perhaps get closer to @nerdralph 's original timing.

MCUdude commented 4 years ago

As an experiment, you could with the recent committed version S file

Didn't help much :/

After a bit more investigation there does not seem to be any issues with @nerdralph's code, but rather the read_str function. Looking at its source code it seems like there's a lot more going on here that may cause the T13 to be a little too slow. Not sure if it is possible to optimize this function any further. I'm not competent enough for that at least!

nerdralph commented 4 years ago

You may want to leave this open, as there are some possible improvements to the Rx to free up more time for the user code to process the incoming data. Just having the current Rx code in an ISR won't help much, because it doesn't increase the time available to the main loop. A timer-based Rx sampling interrupt would solve that problem. The interrupt would be short and runs once per bit, making the time that would be wasted busy-looping for the next bit available to the main loop. This worked for a Tx soft uart I wrote last year, and the same concept should work for Rx. https://github.com/nerdralph/nerdralph/blob/master/avr/ISRUART.c

MCUdude commented 4 years ago

I was thinking about opening a new issue, just to prevent the thread to become too long. This will, after all, be a "new and improved" Rx handler that in my opinion deserves its own topic/thread/issue. Is it OK for you?

nerdralph commented 4 years ago

That's fine by me.

On Mon, Dec 23, 2019 at 6:50 PM Hans notifications@github.com wrote:

I was thinking about opening a new issue, just to prevent the thread to become too long. This will, after all, be a "new and improved" Rx handler that in my opinion deserves its own topic/thread/issue. Is it OK for you?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MCUdude/MicroCore/issues/88?email_source=notifications&email_token=ABKNZ6WVTQ2LU3QQUM3J6STQ2E6CRA5CNFSM4JZ6HSLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHSCHDI#issuecomment-568599437, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKNZ6Q6GZ435QBWJNEP4RLQ2E6CRANCNFSM4JZ6HSLA .

MCUdude / MicroCore

Bit-banged serial, baud rates and error #88