MIT-LCP / wfdb-python

Native Python WFDB package
MIT License
730 stars 298 forks source link

Updated checksum Calculation function: calc_checksum #463

Closed deepanshi-s closed 2 months ago

deepanshi-s commented 11 months ago

The current checksum logic in the wfdb-python library does not take into consideration the sign of the sum of the values. The function computes the sum and performs the mod operation without considering the correct range of the output and overflow. The resulting checksum values in the header file do not match with the corresponding values from the header file generated with MATLAB.

With the implemented logic here, the sum is first calculated and based on the sign of the value correct modulation operation is performed. Line 917-922 is for sum < 0 and line 924-927 is for sum >=0. The logic also accounts for the boundary condition (Line 919). The logic is line with MATLAB implementation of checksum calculation (https://github.com/ikarosilva/wfdb-app-toolbox/blob/6e81e0d4e7e275418bc13def7c29d6a4464a519b/mcode/mat2wfdb.m#L453).

bemoody commented 11 months ago

Thanks for pointing this out. You're right that the checksums computed by wfdb-python differ from the checksums computed by libwfdb or by mat2wfdb. However, both the signed and unsigned checksum values have always been treated as "correct".

I think that when Chen was implementing checksums in wfdb-python, he felt that using the unsigned value was cleaner and more elegant. I tend to agree but don't think it matters much.

However, when using numpy, we don't need to go through the same contorted logic as used in Matlab. AFAIK, we could simply write:

cs = [int(c) for c in np.sum(self.d_signal, 0, dtype='uint16')]

to get the unsigned checksum (what wfdb-python does currently), or

cs = [int(c) for c in np.sum(self.d_signal, 0, dtype='int16')]

to get the signed checksum (what libwfdb or mat2wfdb does.)

Whatever we do, the behavior here needs to be consistent between the "expanded=True" and "expanded=False" cases.