avrdudes / avrdude

AVRDUDE is a utility to program AVR microcontrollers
GNU General Public License v2.0
723 stars 137 forks source link

Read silicon revision from "modern" AVRs #1472

Closed MCUdude closed 1 year ago

MCUdude commented 1 year ago

It seems like Pymcuprog can read a byte from (an undocumented?) memory section that can be used to determine which silicon revision the chip is. If you're struggling with a weird bug in the MCU program, it might actually be a silicon bug. Being able to use Avrdude to read this revision number/letter would only be a good thing.

From pymcuprog/serialupdi/application.py

def read_device_info(self):
        """
        Reads out device information from various sources
        """
        sib = self.readwrite.read_sib()
        sib_info = decode_sib(sib)

        # Unable to read SIB?
        if sib_info is None:
            self.logger.warning("Unable to read SIB from device; attempting double-break recovery...")
            # Send double break and try again
            self.phy.send_double_break()
            sib = self.readwrite.read_sib()
            sib_info = decode_sib(sib)
            if sib_info is None:
                self.logger.error("Double-break recovery failed.  Unable to contact device.")
                raise PymcuprogError("Failed to read device info.")

        # Select correct NVM driver:
        # P:0 = tiny0, mega0 (16-bit, page oriented)
        # P:1 = N/A
        # P:2 = AVR DA, DB, DD (24-bit, word-oriented)
        # P:3 = AVR EA (16-bit, page oriented)
        if sib_info['NVM'] == '0':
            self.logger.info("NVM type 0: 16-bit, page oriented write")
            # DL is correctly configured already
            # Create new NVM driver
            self.nvm = NvmUpdiV0(self.readwrite, self.device)
        elif sib_info['NVM'] == '2':
            # This is a Dx-family member, and needs new DL and NVM
            self.logger.info("NVM type 2: 24-bit, word oriented write")
            # Create new DL
            datalink = UpdiDatalink24bit()
            # Use the existing PHY
            datalink.set_physical(self.phy)
            # And re-init
            datalink.init_datalink()
            # Create a read write access layer using this data link
            self.readwrite = UpdiReadWrite(datalink)
            # Create new NVM driver
            self.nvm = NvmUpdiAvrV2(self.readwrite, self.device)
        elif sib_info['NVM'] == '3':
            self.logger.info("NVM type 3: 16-bit, page oriented")
            # DL is correctly configured already
            # Create new NVM driver
            self.nvm = NvmUpdiAvrV3(self.readwrite, self.device)
        else:
            self.logger.error("Unsupported NVM revision - update pymcuprog.")

        self.logger.info("PDI revision = 0x%02X", self.readwrite.read_cs(constants.UPDI_CS_STATUSA) >> 4)
        if self.in_prog_mode():
            if self.device is not None:
                devid = self.read_data(self.device.sigrow_address, 3)
                devrev = self.read_data(self.device.syscfg_address + 1, 1)
                self.logger.info("Device ID from serialupdi = '%02X%02X%02X' rev '%s'", devid[0], devid[1], devid[2],
                                 chr(ord('A') + devrev[0]))
        return sib_info
MCUdude commented 1 year ago

It looks like we need to add syscfg_base to avrdude.conf for the .avr8x and .avrdx "parents"

When looking at every syscfg_base value for every "modern" AVR supported by pymcuprog, they all have 0x0f00 as their value.

From avr16dd20.py

    # Some extra AVR specific fields
    'nvmctrl_base': 0x00001000,
    'syscfg_base': 0x00000F00,
    'ocd_base': 0x00000F80,
    'address_size': '24-bit',
    'prog_clock_khz': 1800,
    'hv_implementation': 2,
    'device_id': 0x1E9433,
SpenceKonde commented 1 year ago

Do a 32-btye read of the SIB. (but the datasheet says there's only 8 and 16 byte reads?) That'd be the same datasheet that lists multiple mux options for TCD on DA/DB when only one works, and overstates the flash endurance by an order of magnitude? Clearly a source of infallible truth then right? (I know at least half a dozen significant facts that you won't find in the errata or a datasheet) 1110 0100 is 8 bytes 1110 0101 is 16 bytes 1110 0110 is 32 bytes, and is what pymcuprog uses ;-)

That will also tell you the silicon revision - Format is like: "tinyAVR P:aD:b-3M2 (XX.YYYYY.0) where a is the nvmctrl version and b is the on chip debugger. There may be a null character in the data that comes off the chip to confuse you there, but do not be fooled, There is more to it: -3M2 (XX.YYYYY.0) -3M2 has to do with the PDI oscillator according to what leaks out of microchip in their code. XX is the silicon die rev represented as ascii characters that represent a single hexadecimal byte. YYYYY appears to be yet another bloody die identifier: There are two large scale clades here: The AVR Dx-series parts with word write and all sorts of yummy features like that have YYYYY = "KV00x" where x seems to have increased monotonically in order of product release, but more quickly than one would expect - some numbers between numbers or letters that are used, but are apparently not used. The shitty tiny-derived parts with the crapola 16/20 MHz fuse selectable oscillator, generally paired with much worse calibration facilities than the actual tinies have (EA certainly does take the worst of both worlds) have YYYYY = "59Bxy where y is any hexadecimal digit, and x is a 0 or 1. MegaAVR 0's have 59B2y, and EA's are 59Fyy

But the cool thing is that as early as the SIB read, you have knowledge of which silicon rev you're uploading to.

That all said, considering Microchips sense of urgency (less than a three toed sloth on a monday morning) about those die revs, this may not be an urgent thing to address, I believe (aside from cradle revs, while the part is barely shipping, and hardly any of the bad ones making it out) that only 3 modern AVRs have gotten a die rev that fixed problems (1-series 16k got a rev B that fixed 3 problems. 3217/3216 for a rev. C that fixed a lot more problems, and 3208/3209 is up to Rev. D, with a fairly nice batch of fixes. and everything else is milling around holding their broken peripheral wondering when it will be repaired. Oh, and the 2k/4k 1-series with 20 or fewer pins also are up to rev C. Rev C was mostly the same as rev B (and Rev B was a cradle rev, because the initial silicon wasn't just a basket case - it was a hand basket, headed to hell and and the basket had a hole in it and stuff was falling out.

There are not many errata for which the workaround is so costly that it's a problem to deploy everywhere, usually the workaround is of the same approximate amount of wasted time as testing to see if you're effected. (though there are a few where if your part is effected, and your have certain tasks to do, you nod slowly, and with a somber face, you place the circuit board in the trash can, and go sit in the corner with the bugs and the dust waiting for a die rev.)

(Where's the dust? You don't see any bugs? Oh, yeah, there were so many people waiting for die revs that they chased off or stepped on all the the bugs and scattered the dust. Even though we get a new shipment of bugs and dust brought in weekly.... Oh no, I don't go for anything specific, I just order the Hazmart Biting and Stinging Insect Assortment - it's got everything from hornets to scorpions to tsetse flies (I think you can pay extra to get the disease carrying insects pre-infected with deadly diseases like malaria, dengue fever, sleeping sickness and so on, as appropriate to the species), and the dust is just their patented hazardous dust (based on a traditional lead pigment and inorganic mercury compounds, plus a strategically selected mix of plastics bearing PCBs, brominated flame retardants, bound together by a mix of polycyclic aromatic hydrocarbons, and other pollutants, guaranteed 90% by weight toxic or carcinogenic compound, or 99% in the state of california). And still, by the end of every week the dust has been tracked out and the bugs - unlike the silicon bugs we're waiting on fixes for - have all been squished or killed by the toxic dust - and a crowd is gathering again in the die rev waiting corner with you!" "Hm? Which day to we have them delivered on? Hmm checks watch funny you should ask..." beep-beep noise from a truck in reverse getting louder)

SpenceKonde commented 1 year ago

lol, it's not a bug it's just a feature they didn't tell us about. It's obviously intentional. That's from Microchip. They don't use behavior that they consider incorrect. They mark it as errata and fix the hardware (granted, by then the sun may have exhausted most of it's fuel, expanded to envelope and incinerate the earth, so we may have other concerns than errata).

I imagine it's a lot more useful if you have a database you can access that tells you which code corresponds to which die.

There's one tricky thing about the data out of the SIB. It's meant to be displayed as ASCII, but there's a nul in the middle there that trips up a lot of methods of displaying it. Within the parenthesis, the first two digits are the die revision as hexadecimal formatted per REVID. The next five digits appear to have the right most one (Dx) or two (Ex/tiny/mega0), while all the Dx-series start with KV00, and the others with 59B or 59F. I have some AVR32DA's from before and am getting some from after the bug was fixed. I'm very very interested in whether there will be a change in the SIB between my old busted one and my new working ones (The die rev WAS NOT CHANGED, nor was the bug ever publicly acknowledged except informally by a senior engineer posting in avrfreaks telling people to contact support for replacements). But maybe the sib knows something they don't talk about. Maybe that's what that last field is? set to increasing numbers to indicate the smallest of revisions? Cause either that, or they set that during factory calibration (REALLY?) or they have two different die revs that have the same revid, both shipped to consumers and can be distinguished only by lot code, but one is so bad they had to recall it? Maybe the SIB reveals a difference. I find it unfathomable that they wouldn't make an easy way to check if the parts had the bug fixed. Impacted chips are totally, can't even run blink, hosed.

But these parts, you know the ones qualified for use in life safety critical applications? Well the first AVR32DA's that reached customer hands through extrordinarily poor QA had 2 byte vectors instead of 4 byte vectors, meaning that only the first and last two 4k regions of memory could contain an interrupt..... and that's if the compiler did the best it could. But it doesn't even know, so it generates 4 byte vectors. Reset will work. Every even vector will execute a different vector than intended (generally one that wasn't defined hence jumping to BADISR, which jumps to reset which will then (if it's on current DxCore) issue a software reset to correct this dirty reset. The odd reset vectors will depend on whether -mrelax is used. If it is, most of these will be harmless NOP's and they will run the wrong ISR as before. If instead they are JMPs or had to be JMPs even with -mrelax, then the ISR will attempt to execute the address of the location of the ISR as if it were the instruction, which could do quite literally half of all possible things, though only a small number are more than incrementally worse than just restarting there. THIS was "certified for life safety critical systems"?! I'd hate to see our non-critical systems. Oh, right, I disassembled some a few weeks ago, they were pretty shit. Lately I've gotten to participate in scrapping some discarded life sciences equipment. I dunno how much that cost new, but I almost felt unworthy to take it apart (But it was left at the dump after the company was bought out. Some of them used all english screws, internal substructures were hogged out of aluminum or fiberglass blocks (like FR4 only >1/2 inch thick and no circuit or soldermask or that crap - it was used to mount the motor that would bring this stage up into a location with incredibly sharp needles and some samples in a tray did something involving some sort of light (we couldn't find the light source, only a very fancy mirror assembly. Previous equipment has yielded several single photon detectors (!!). All of it BEAUTIFULLY built. The one with the photon detectors also had an Atmel AT94FPSLIC or whatever they called them, the wimpy AVR embedded in an FPGA? That are like $10 per chip new today? Yeah that thing made in 2002 (ie, when these chips were brand new and hot shit), had 3 of them... plus ATmega's and an ATmel CPLD. Looked in beautiful condition. That one was FULL of E. C. O.s (Engineering Change Order - post manufacturing, pre-sales retrofits - typically implemented by hand by people with dremels to cut traces and hair-thin wire and UV-cure glue to hold it in place to make repairs with)- It's what you do when paying someone to hand-repair your batch of bad. fully loaded, boards is cheaper than just trashing them (ie, when the boards cost a fucking fortune, and your product sells for a truly eye-watering sum) During development, I've heard of another company which would get ultra-rush short turn time orders (where the customer would agree to accept untested boards and test them themselves in exchange for getting them faster) coming with a power rail shorted to ground or someting. No problem, despite the expense back in those days, they dropped a few thousand on a thermal camera, dumped current into the short until the short was visible on the thermal camera, drilled out the short it out and patched up the mess. These were boards the size of whole panels, near the state of the art limit on number of layers on a board, etc - and it was decades ago - so those boards were big money - and they were always being designed with greatest sense of urgency. At the time the company was growing at 30+% per year, and their biggest challenge was development time constraints, so that was a perfectly reasonable thing to do) I don't know which products that was used on, but I know the cheapest devices they sold were around a mil for the "poverty model" and at least twice that with all the options (semiconductor test equipment - it's what would test either dies or assembled devices coming out of manufacturing and deterring if they worked. Fairly small volume business (also boom and bust, though more consistent now I think), and like every other piece of equipment involved with semiconductor production until you get down to the level of really simple devices, like reeling machines and stuff, , crazy expensive,

Did you know that those reels they use for ICs are great for wire, by the way? They're also CHEAP AS DIRT from places like Mid America Taping and Reeling Made in USA (what?! You mean assembled in USA right? No, the things I bought don't come assembled if there is any assembly involved... so there's only one manufacturing step, and that's injection molding, and that's done here). You wouldn't use them for like, hookup wire, you'd want something of lower diameter but wider for that (you can find such spools on aliexpress - but US made reels are cheaper in the states, amazingly, than I've been able to find imports for. ). For things like long network cables (less flexible, and thicker), they're great (you use diagonal cutters or similar to chew away some plastic so one end fits in the hub and keeps it from slipping). Also great for any sticky tape that has peel off backing on all sticky sides, peel and stick velcro and EL stuff, and for cable braid, even some shrink tube (it works for narrow shrink tube and it works for large shrink tube. Any thing in the middle though, doesn't do well (when it's really small they ship it as round cross-section stuff, but it's flexible enough to roll up. But thicker stuff, especially the good kind (glue lined) becomes too stiff for that and has to be stored as long awkward thin things. Then they stop making it round, flatten it (crease two opposite sides), that rolls up REAL good!