artefactual-sdps / enduro

A tool to support ingest and automation in digital preservation workflows
https://enduro.readthedocs.io/
Apache License 2.0
4 stars 3 forks source link

Check integer values before conversion #1019

Closed djjuhasz closed 2 months ago

djjuhasz commented 2 months ago
codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 52.02703% with 71 lines in your changes missing coverage. Please review.

Project coverage is 53.14%. Comparing base (0ecec2a) to head (95e9383). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/package_/package_.go 2.85% 34 Missing :warning:
internal/package_/preservation_action.go 6.66% 14 Missing :warning:
internal/a3m/config.go 0.00% 9 Missing :warning:
internal/package_/goa.go 0.00% 7 Missing :warning:
internal/persistence/telemetry.go 0.00% 3 Missing :warning:
internal/package_/preservation_task.go 75.00% 1 Missing and 1 partial :warning:
internal/workflow/local_activities.go 33.33% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1019 +/- ## ========================================== + Coverage 53.11% 53.14% +0.02% ========================================== Files 100 100 Lines 5810 5807 -3 ========================================== Hits 3086 3086 - Misses 2471 2472 +1 + Partials 253 249 -4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

djjuhasz commented 2 months ago

I'm just reviewing this myself and I'm wondering if using uint for all primary key representations would be a better solution.

The MySQL schema we are using defines all of the primary keys as unsigned 32-bit integers with a max value of 2^32 -1 or 4 294 967 295, e.g.:

CREATE TABLE package (
  `id` INT UNSIGNED AUTO_INCREMENT NOT NULL,
   ...

See: https://dev.mysql.com/doc/refman/8.4/en/integer-types.html (Note: MySQL lists their integer widths in units of 8-bit bytes, so 4 bytes = 32 bits)

But entgo by default represents all primary keys as signed int values in Go, e.g.:

// Pkg is the model entity for the Pkg schema.
type Pkg struct {
    config `json:"-"`
    // ID of the ent.
    ID int `json:"id,omitempty"`

Go's int type might be 32 or 64 bits wide (https://go.dev/ref/spec#Numeric_types) so it's possible (though unlikely) that a large primary key value (max value: 2^32 -1 or 4 294 967 295) will overflow when converted to an 32-bit signed int (max value: 2^31 - 1 or 2 147 483 647). However most modern CPUs / OSs are 64-bit, in which case Go's int will likely be 64-bits wide (max value 2^63 - 1), which would mean that MySQL's 32-bit unsigned integer values could not overflow when converted to an int (2^32 -1 < 2^63 - 1).

Possible solutions to address primary key overflow:

  1. Current commit (int for primary key reps, except in the Goa layer where uint is used) but add extra boundary checking. Add boundary checking when getting a primary key (int) value from the database, e.g.: if id < 0 || id > math.MaxInt { return errors.New("integer overflow") }.
  2. Use uint for primary keys at all layers of the app: database (MySQL), ORM (entgo), persistence (datatype), Goa. This avoids type conversion for primary keys altogether, but requires changing the entgo schema to use uint for primary keys.
  3. Use the current commit unaltered and trust that overflows will not happen on 64-bit systems, and that even on 32-bit systems the conversions from uint to int and back will all come out in the wash: i.e. converting uint (MySQL) > int (entgo) > uint (Goa) will result in the expected value, even if it's bigger than the max 32-bit int value.

I created a small Go Playground test case for converting from uint -> int -> uint: https://go.dev/play/p/cug1svAcjvW

sevein commented 2 months ago

Thanks, David. My vote is to go with the third proposal and not support 32-bit systems. The industry has largely moved to 64-bit environments, e.g. macOS no longer supporting 32-bit apps, Windows defaults to 64-bits, Linux distros like Ubuntu or Arch no longer offering 32-bit versions by default.

I'm not afraid of missing out — even my Raspberry Pi supports 64-bit binaries. Some low-cost, embedded, or industrial hardware may be affected, but that's not our audience, as far as I know.

While we're not distributing artifacts for 32-bit systems, if we wanted to record this decision in the codebase we could include the following file that it would cause the application to panic at startup on 32-bit systems unless built with the go build -tags with_32bit option:

//go:build !with_32bit
package main

import "math/bits"

func init() {
    if bits.UintSize == 32 {
        panic("32 bits not supported")
    }
}
mcantelon commented 2 months ago

@sevein +1 to not supporting 32 bit! There's been an exodus from 32 bit in hardware over the last decade and a half or so so it's definitely safe not to support it.

@djjuhasz I'd say option 3 is fine for now and we could file an issue to add a check, like @sevein suggested, for 32 bit when time permits.

mcantelon commented 2 months ago

(And using uint for keys makes more sense, to me, than int given keys tend not to be negative numbers.)