golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.99k stars 17.67k forks source link

proposal: hash/crc32: Support both reverse and forward CRCs #60992

Open onitake opened 1 year ago

onitake commented 1 year ago

There are many widely used CRC algorithms, but the Go standard library implementation only supports CRC32s with reversed polynomials. There are various terms for these variants: In https://reveng.sourceforge.io/crc-catalogue/all.htm, they are called "reflected", but they could also be considered "little endian". Some algorithms use a "big endian" bit order, which is not compatible.

Unfortunately, it isn't trivial to use a forward polynomial in a reversed implementation or vice-versa. It seems that reversing the bit order in various places (including table calculation) would enable this (the opposite way is described here: https://stackoverflow.com/questions/53812834/computing-crc16-with-reflected-bit-reversed-input-in-c), but I couldn't get it to work with the CRC32 used in MPEG-TS.

Furthermore, the inversion of the CRC at the beginning and end of the CRC calculation in crc32.simpleUpdate does not account for algorithms using a different value than 0xffffffff for the initial and final XOR, but this is less of a concern. Different values can be applied before/after calling crc32.Update.

larschri commented 10 months ago

I believe "reversed polynomials" is a misconception - the polynomials themselves are not reversed. It is only their binary (uint32) representation that can take different forms. See https://en.wikipedia.org/wiki/Cyclic_redundancy_check#Polynomial_representations.

I think the proposal here is (or should be) to add support for four commonly used parameters to the crc32 package. These parameters are commonly refered to as init, refin, refout and xorout, and they are used to specify the variants on https://reveng.sourceforge.io/crc-catalogue/all.htm.

crc32.Checksum assumes these settings:

init: 0xffffffff
refin: true
refout: true
xorout: 0xffffffff

Variants with other settings can be implemented on top of crc32.Update, but it is not trivial to do this correctly. Here's a gist that attempts to do this:

https://gist.github.com/larschri/2b793d583ba393492531ffeb51fe9e43

It would be nice if the parameters could be passed to the checksum implementation in the crc32 package. Maybe like this:

    checksum := crc32.Checksum([]byte("Hello world"),
            crc32.MakeTable(0xD5828281, crc32.WithRefin(false))
            crc32.WithInit(0xffffffff),
            crc32.WithRefout(false),
            crc32.WithXorout(0x00000000)))

See also the guide at https://archive.org/stream/PainlessCRC/crc_v3.txt. It would be nice if the documentation in the crc32 package contained more details. And it would be helpful to mention the existence of these parameters even if they wont be supported. I also think that the crc32q-example in the docs should be removed or replaced. It gives incorrect checksum, because it doesn't apply the appropriate settings.

onitake commented 10 months ago

I would very much prefer the generator object approach. Usually, you don't need many different algorithms in one specific application, so it would make sense to just instantiate the generator in one place and then use it everywhere.

But that is already what has/crc32 does, so I think would be enough to extend (and possibly rename) crc32.Table to encapsulate the other algorithm parameters.

Example:

mpeg2crc := crc32.MakeTable(0x04c11db7, 0xffffffff, false, false, 0x00000000)
checksum := crc32.Checksum([]byte("Hello world"), mpeg2crc)

It might even be possible to incorporate some of the parameters into the common part of the crc32 implementation, to take advantage of the optimized versions for cases that only differ in initial value and xorout. Or perhaps modify the optimized code to work with any polynomial and any of the four combinations of refin and refout.

larschri commented 10 months ago

My example code above was an attempt a failed attempt to shoehorn these parameters into the existing functions. Functional options is a common approach to extend existing functions without breaking backward compatibility.

I don't have any strong opinions about how this should be exposed, though. I can see the benefits of the generator object approach, but I don't think it can be added to crc32.Table and crc32.Checksum. New functions will require new names, but maybe that's better.

Edit: @magical is right - these parameters can't be supported by existing functions without breaking backward compatibility.

magical commented 10 months ago

@larschri Changing a function signature is generally not considered a backwards-compatible change -- even just adding a ... parameter -- because it would break code that assigns the function to a variable. e.g.

var myChecksumFunc func([]byte, *crc32.Table) uint32
myChecksumFunc = crc32.Checksum

Also @onitake, crc32.Table is defined as type Table [256]uint32 and that can't be changed either, so i don't think it's possible to add any information to it. You would have to add a new type.