SpenceKonde / megaTinyCore

Arduino core for the tinyAVR 0/1/2-series - Ones's digit 2,4,5,7 (pincount, 8,14,20,24), tens digit 0, 1, or 2 (featureset), preceded by flash in kb. Library maintainers: porting help available!
Other
554 stars 144 forks source link

ATtiny3217 - Having struggles with serial baud rate only while using external clock #992

Closed zac-baker closed 1 year ago

zac-baker commented 1 year ago

SPOILER ALERT: It was a cold solder joint :)


Hi Spence and others,

I've been using megaTinyCore for about a month now to develop some boards and I have been enjoying all of the tools you have provided. The exhaustive documentation is highly appreciated but I can't seem to dig up information that can help me solve my current problem. Hopefully it is something silly that I'm not understanding, but I think I need assistance at this point. I did find and try to use the very relevant ClockDiagnose sketch, but sadly it wouldn't compile for me on megaTinyCore 2.6.4 and Arduino 1.8.13. I didn't record the errors I got but I can get them for you if you would like.

THE CONTEXT

I designed a board around the ATtiny3217. I am able to program it reliably. In trying to establish serial communications as a first step in writing the software for this board, I have found myself struggling to do so. The ATtiny is talking to an RS485 transceiver which I am connecting to my computer using an RS485 to USB transceiver. I am using the program HTerm to monitor the communications.

THE UNDESIRED BEHAVIOR

I am able to successfully serially communicate data but only at mismatched baud rates between the ATtiny and HTerm. If I set both the ATtiny and HTerm to 57600 baud, I get garbage. If I set the ATtiny to 57600 baud and HTerm to 8000 baud, I somehow get clean data. I have verified with a scope that the data is indeed moving at the baud rate that I have to set my receiver at to receive clean data. I have tried a wide range of baud rates but have found that the same roughly 7.2:1 speed ratio is required between the ATtiny and receiver baud rates to get clean data.

As I will discuss soon in more detail, I am using an external clock. Something extremely worthy of note is that when I select "20MHz internal" for the clock upload setting to use the ATtiny's internal clock instead of the external clock, the serial communications work as expected with identical baud rates on both ends. It is only when I select the "20 MHz external" option that I get this weird baud rate ratio situation.

All of this, of course, led me to believe that there was either a clock issue of some sort (and specifically most likely a prescaler or divider setting somewhere) or my RS485 transceiver IC is slowing down the communication for some reason. I have perused the transceiver's datasheet and have not found any reason why the latter would be the case as it is capable of up to 50Mbps signaling rate. I could, of course, lack some critical understanding of how these transceivers typically behave because I have not worked with them until now. The fact that the communications work as expected when using the ATtiny's internal oscillator encourages me to dismiss the transceiver IC as the possible problem.

THE CLOCK

The function of the board is to take very frequent analog input readings using an external ADC IC and the decision was made to use an external clock for accuracy and stability reasons. The clock that was chosen (20MHz 5V oscillator) should work by my understanding of your documentation and your comments on issues similar to this one.

I do not believe there is an issue with the PCB design as the high frequency trace is extremely short and the oscillator does have a 0.1uF bypass cap. Examining with a scope shows that the signal is a clean 20MHz square wave. I also have verified that the oscillator is oriented correctly and even tried replacing it with a new one before I remembered I can just verify its functionality with a scope which I did after.

THE UPLOAD SETTINGS

By my understanding, these are the correct settings and should not in any way result in the serial communications running at an unexpected baud rate. image

THE CODE

My code is very simple (perhaps too simple.. did I miss anything?)

// Serial communications definitions
#define RS485_BAUDRATE 57600
#define RS485_RX_PIN PIN_PB3
#define RS485_TX_PIN PIN_PB2

void setup() {
  //CCP = 0xD8;                                     // writes the configuration change protection key, hopefully
  //CLKCTRL.MCLKCTRLB = 00000001;                   // disables the main clock prescaler

  Serial.pins(RS485_TX_PIN, RS485_RX_PIN);          // I used pins() here for readability, I know that swap() is recommended
  Serial.begin(RS485_BAUDRATE, SERIAL_8N1);
}

void loop() {
  Serial.write(255);        // 11111111
  Serial.write(85);         // 01010101
  Serial.write(0);          // 00000000

  delay(100);
}

MY FORAY INTO THE CONFIGURATION REGISTERS

I put on my floaties and started reading the 3216/17 datasheet. I took a stab at playing with the registers and setting them directly myself to make sure that everything was actually being set up for a 20MHz external clock source, but sadly didn't solve the problem. I tried setting FUSE.OSCCFG to make sure the 16MHz internal clock wasn't somehow interfering, I tried making sure CLKCTRL.MCLKCTRLA was set to use an external clock, I tried making sure that CLKCTRL.MCLKCTRLB was set so that the main clock prescaler was disabled, and I tried setting USART0.BAUD to directly set the baud rate division factor.

None of that yeilded any changes except my tampering with CLKCTRL.MCLKCTRLB (the prescaler register) which interestingly resulted in 2.4:1 being the new baud rate ratio between the ATtiny and HTerm (my serial monitor). The ATtiny set at 57600 baud spoke cleanly to HTerm set at 24000 baud. I found this extremely interesting, but I am not familiar enough with the ATtinys or microcontroller registers in general to really do anything with this information.

CONCLUSION

And so I have finally turned to creating an issue. I hope that I have provided enough information in a (hopefully) concise way to make it easy to assist me. If there is any info I forgot to include, please let me know. I have a bit of a track record for making extremely silly mistakes, so hopefully this is just another one of those. In fact, I am surprised that I didn't find my solution at some point along the journey of typing this up, as I often do when I employ the rubber duck strategy.

Thanks in advance for any assistance!

SpenceKonde commented 1 year ago

A 2.4 ratio and a 7.2 ratio. 1 to 3

That is.... suspicious.

print out Serial.print(F_CPU); use Serial.printHexln() / printHex() to print out the values of these registers; CLKCTRL.MCLKSTATUS CLKCTRL.MCLKCTRLA CLKCTRL.MCLKCTRLB and let me know the results, I don't have any boards with external osc on them. (before you fiddle with any registers)

To turn off the prescaler, MCLKCTRLB should be set to 0, not 1. 0x01 is PEN = 1 (prescaler is enabled) at /2 prescale factor. That will bring you up to a 1.2 ratio.

So we look at the evidence: Exhibit A: Prescaler is starting at the hardware default, not the core default. Exhibit B: (provisional, but I would gladly bet on this being found to be the case) with the prescaler actually off, 48000 baud will work. (if not, try 49000)

I propose the following explanation:

  1. The external clock is not wired correctly and/or is somehow failing to reach the destination pin due to assembly issues.
  2. Hence, the device starts up, and then tries to switch to the external clock, After it does that, it would turn off the prescaler. But it didn't ever see an external clock. Thus, after a few tries of waiting for the clock, it gave up and began executing the code. We deliberately run at the slowass base speed (as mentioned in the documentation), because that is much more likely to be recognized as a clock problem than if you thought you were on a rock solid accurate external clock but it was using the 20 MHz internal, for example. I thus predict you will see on startup and when 8000 baud is working: FUSES.OSCCFG CLKCTRL.MCLKCTRLA = 0x03; //external clock selected CLKCTRL.MCLKCTRLB = 0x11; //prescaler enabled and dividing by 6 CLKCTRL.MCLKSTATUS & 0x81 = 0x01; // Oscillator switch in progress. External oscillator not stable or nor oscillating at all.

Since you report confirming the incoming clock on the scope, I would theorize that the clock is connected to the wrong pin or thatthere is a cold solder joint. The latter is particularly likely if you're not well practiced at soldering QFNs.

Thus, it is starting up at 16/6 MHz, You turned it up to 16/2 MHz. Now you may object that the numbers don't quite work out. But the internal oscillator is only within about 1% typically. Those Serial adapters that don't have a crystal? They are not dead on either! They can be of a percent or so, and it depending on the make and model of serial adapter, there may also be calculation error in the baud rate. And Serial will generally work barely within 3%, so if the errors were in the right direction it would just barely work with those speeds.

zac-baker commented 1 year ago

I really appreciate you taking the time to write up a response! I should be able to work on this on Monday to get you those results and investigate the cold solder joint theory.

I did just recently start working with very small surface mount components so my practices likely aren't the best. Currently I'm using a stencil, solder paste, and a hotplate to populate boards. Perhaps my solder paste is old or something else went wrong. It's definitely a good candidate for being the problem.

I'll get back to you soon, thanks again!

zac-baker commented 1 year ago

The board behaves as expected now. The culprit?

drum roll

You guessed it, a COLD SOLDER JOINT!

I went around the chip with a soldering iron and now the ratio behavior when using the external clock is gone. You were also right that I was setting CLKCTRL.MCLKCTRLB incorrectly. When it is set to 0, I get equivalent baud rates on both ends. When I set it to 1, I get a ratio of 2:1 which makes complete sense.

It really is the simplest answers sometimes. I definitely need to revise my techniques when it comes to how I'm soldering my parts on.

I really appreciate your help and work on this core, Spence! Sorry to waste your time with something silly like this. Hopefully it will help someone else down the line in a similar boat.

SpenceKonde commented 1 year ago

TBH, I don't think the way that register works makes any sense at all. Why a prescale enable bit and a prescale select bitfield? The precedent was always one bitfield, and the first setting would be no prescaling. The prescale enable bit raises questions in my mind - why would they do it differently, in a way that was from almost any perspective worse? And make no mistake, it is worse - It makes one number in the combined byte count up wierdly, with almost half of codes invalid - 0b00000 /1, 0b00001 /2, (so far so normal) 0b00011 /4, (whaa, didn't we miss one? Nope, 0b00010 is prescale/4 but disabled), 0b00101 /8. and so on, and conceptually, disabling the prescaler is the same thing as setting it to divide by 1, and historically that has been how AVRs did it. This new method, with half of it's codes equivalent, and a resulting bit pattern for successive values that increases the risk of user error (and it does - exact same mistake essentially, that catches a lot of people, because in code where you're doing stuff like that it's very natural to want to treat the register like a single bitfield, when no, either bit 0 is 0, and nothing else matters, or bit 0 is 1, and the next 4 bits select the prescaler option, so the numeric values of the register count up 0, 1, and then every odd number if you were reading off valid configurations. What were they thinking? Were they they thinking? Did they back themselves into a corner where they had to have a that bit structure and couldn't easily alter the design to do it the right way when someone pointed out that the way they did it was stupid and different for no good reason?

zac-baker commented 1 year ago

That is actually fascinating. It's so counterintuitive and inconsistencies in convention like that really bother me. I spend a lot of energy making sure the things I work on have consistency. Like you said though, perhaps there was some circumstance that makes it make sense why they would do it that way. The world may never know.

SpenceKonde commented 1 year ago

Well, you'll love the Dx and Ex-series parts. The mapping of alt-function-of-peripheralXXX is the same on all Dx and Ex parts. Not all parts have all peripherals, and they seem to be allowed to add pin mappings (so later parts have more mapping options). But it appears that there's a moratorium on removing options, and they went so far on the DD as to number it's single zero crossing detector ZCD3. Because the 14 and 20 pin versions didn't have any of the pins associated with ZCD's 0, 1. or 2. But if they added a way to change the mapping, then every future part with zero crossing detectors would have to get it too, and they didn't want to commit to that. So they connected the ZCD0 to a different pin, and called it ZCD3 in the headers and docs :-P So they seem to be serious about consistency sometimes (like, that is consistent, while if they called it ZCD0 it wouldn't be).

But the problem is that the first modern AVRs (0/1-series tinies) were released before they were done (see the errata - the 2/4k parts came first, with an absolute horror show errata sheet. It was actually bad enough that microchip rev'ed the die fairly quickly, and then again later to fix almost nothing, then the 8k and 16k ones (not sure if the 8's were released in better condition than the 16's even though less errata had been fixed, because a bunch of the bugs were introduced by the extra peripheral loadout, or if the 16's were second and that's why they have so many bugs) 16k has gotten a die rev to fix like 4 / >20 errata. The 32k ones came last, so when released, they were in the best condition to begin with, and they got a bigger die rev around when the VAO parts started shipping, which DID fix the nastiest bug (and some less nasty ones). And the DA and DB are also riddled with errata that they have shown no inclination to fix, even when we know from the DD that they have fixes. (the charitable explanation is that they're trying to get out enough representative products to cover the roles of the classic AVRs (which Atmel had let turn into a backwater, while they threw good money after bad with xMega), which I suspect they (rightly) see as antiquated and inferior to the newer ones) (you can sort of tick off what parts some of the series are aimed at. The writing is on the wall about the tinyAVR branding. It's dead. The EB product brief snapped a photograph of that writing on the wall, printed it on a large sheet of colored paper and nailed it to the coffin of the AVR brand before giving it a dignified burial among the other debris in the dumpster.

But I really feel like most of their AVR products are aimed as much at their own products as competitors (I think they want to get as many people switched over to the new ones - these parts blow the doors off the classic ATmegas as you've likely noticed. But most things will if not blow the doors off a classic AVR, most will at least set them rattling disconcertingly. So they don't exactly impress people ya know? In addition to hoping to win customers away from other companies, they're trying to make sure they have a migration path for everyone. DU is aimed at m32u4/16u4, as well as 8u2 and 16u2 DD is aimed at low-end parts like the old 328p (sorry, that was a low end part. Like, it was basically the worst device that they were willing to call ATmega) in it's high pincounts, and the top of the classic tinyAVR line, as well as aiming for new customers with aggressive pricing and fat featurelist (MVIO in particular is a real killer feature). The DA and DB give two different assortments of features for the people who would be buying high end classic ATmegas if it were 10 years ago, trying to cover most of the bases. The EA was yet another DA/DB-like general purpose option except it finally had a real differential ADC which we hadn't seen on a new AVR release in a decade (the Dx-series differential ADC is crap, I think they hacked 2 more bits onto the 10-bit dual ADC thing in the tiny1's, hid the second adc, and then rigged the "negative" ADC to be triggered whenever the "positive" ADC is, then they subtract the two readings and there's your value. Max voltage is VRef (instead of maximum difference being VRef), so you can't do high side current sensing with it, and it has no gain). All of those are fixed on the EA and 2-series ADC. And the EB? The low pincount versions are aimed squarely at the ATtiny861. They couldn't come out with it's replacement without the new ADC, because the ATtiny861, in addition to the fancypants timer (look at the EB16 headers for TCE and WEX. Those things made the TCD look straightforward and it's hard to totally understand them, but it definitely looks like it's meant to do the same thing as the timer on the old t861.

And in higher pincounts it also can replace some of those obscure older AVRs with PWM in the name, or the ones with PSCs. The only ones that haven't gotten a migration path yet are the mega x9 (LCD drive builtin) and the big kahuna, the m2560 and it's ilk (which I suspect they are trying to find any possible reason to delay, because it's gonna be ugly no matter how they address the main issues (crossing the 65536 word barrier, hence gaining a third program counter byte and performance hit, and they can only give single cycle bit access to 56 pins at a time - they're gonna have to figure out how to deal with that issue - do they just ignore it and have 4 ports full of second class pins that don't have single cycle acccess via SBI/CBI? Most of the band-aids would make it worse, and I can't imagine them adding another quarter of the I/O space to the CBI/SBIable list (they do, however, have enough registers in the I/O space. CBI/SBI/etc work only on the lower half, but they could if they really wanted to and were willing to expend resources, add to the instruction set - they don't have a lot of opcodes left, but this wouldn't take all that many of them.

Now - as I was saying, the modern AVRs weren't done when they shipped the 0 and 1 series tinies. Compare their event system to the event system on the tiny2.

Then compare tiny2 (much much better - but channels are still not fungible) to the EA-series - where at long last, all event channels are fungible, and you ionly need one list of generator options.