Ben's style and structure review

benhoyt commented 1 day ago

Enhancement Proposal

Overall, this is nicely structured! Good package structure, excellent use of the standard library rather than tons of 3rd party depenendencies, some good tests, and so on. If this is your first Go project, well done!

I've written up a bunch of comments below, labelling each "Minor", "Major", or "Catastrophic" (none yet -- might have some next week :-). The difference between Major and Minor is a bit arbitrary, but I've mostly used Major for "strong recommendation" and Minor for "style stuff". That said, many of them would be quick refactorings or quick fixes, so I'd still recommend doing that to tidy up the style a bit.

A couple of the more interesting things:

"only apply middleware to the handlers that need them", rather than applying them to all routes and then doing ad-hoc filtering and route matching in the middlewares.
I haven't reviewed db.go yet. But instead of doing your own database row scanning and parametrisation, consider using sqlc (I've used it in personal projects and really recommmend it) or Canonical's own sqlair type mapper. I like sqlc because it's pre-generated from your db schema with go generate so you get nice type-safe structs. I can give you a demo of sqlc if you want.

I haven't finished the review. I have some security concerns about how you're generating passwords -- but that's enough for today! I'll review handlers_users.go and the database code next week.

Early notes from a quick scan of the database code: SELECT * is a bad idea in production code -- I'll explain why next week. I've also got some comments/concerns about the database schema.

Overall, it'd definitely be worthwhile reading a couple of Go style guides in detail, obviously the Canonical one (from Ed Jones), and also Google's Go style guide is good.

cmd/notary/main.go

Minor: Why do you want the logs going to stdout? It's more typical to have logs going to stderr.

    log.SetOutput(os.Stdout)

Minor: You might want to look into graceful shutdown, so that if SIGTERM is sent (or Ctrl-C is pressed), it allows a small amount of time for requests to finish before exiting. This is pretty easy to achieve in Go. There's an example of this pattern in the Server.Shutdown docs.

cmd/notary/main_test.go

Minor: Would be nice if TestMain didn't install stuff into your go home directory. When we've done this, we've just done a go build ./cmd/notary to build the binary in the current directory, and then use that.

func TestMain(m *testing.M) {
    cmd := exec.Command("go", "install", "./...")

Minor: It's better to follow the standard err naming (rather than writeCertErr or writeConfigErr) and check the error after each call, even if a tiny bit more verbose:

    writeCertErr := os.WriteFile(testfolder+"/cert_test.pem", []byte(validCert), 0o644)
    writeKeyErr := os.WriteFile(testfolder+"/key_test.pem", []byte(validPK), 0o644)
    if writeCertErr != nil || writeKeyErr != nil {
        log.Fatalf("couldn't create temp testing file")
    }

That said, it might be better to create a cmd/notary/testdata directory and have those cert and key files in there -- it makes the test file a bit cleaner, you won't need to write them out, and go test automatically changes the current directory to the package directory when running the test binary, so you can reference them with a relative path like testdata/cert.pem. You can probably get rid of all the temp directory and cleanup stuff in TestMain in that case too.

Major: For all of these errors, you should include the err value in the log message, so that you're not completely blind if something does go wrong. Similar throughout. For example, change the above to:

        log.Fatalf("couldn't create temp testing file: %v", err)

Minor: Nice use of table-driven tests. However, you should use t.Run inside the test cases loop so that you can observe and select specific sub-tests with `-run=Foo/bar. I see you've done that in some other places, just not here.

Minor: I don't think you need to save/restore os.Args here, as you're running a new command, not doing anything with os.Args. You can remove this:

    oldArgs := os.Args
    defer func() { os.Args = oldArgs }()

Minor: In TestNotaryFail, you can probably use Cmd.CombinedOutput to avoid the Start/Wait/read-StdoutPipe and simplify the code.

internal/api/server.go

Major: It's highly unusual (and confusing I think) to have a directory named something different from the Go package name (internal/api is the directory, but the package is server). Leading to you wanting to rename the import:

    server "github.com/canonical/notary/internal/api"

Just rename the directory to server and avoid the confusion.

Minor: In Go everything is prefixed with the package name, so you want to make use of that and avoid "stuttering" if you can. So if there's one main type, use server.New() rather than server.NewServer(), and db.New() rather than db.NewDatabase().

For server.NewNotaryRouter, that should probably be server.NewRouter (it's redundant to include the project name).

Minor: Environment seems like a confusing name (it's not environment variables). How about RouterConfig or HandlerConfig? (And rename NewRouter to NewHandler -- which is actually more what it is, as it does all the handling, not just routing.)

Minor: In NewServer, why does creating the key pair return an error (good) but connecting to the database does a log.Fatalf (not so good)?

Minor: In SendPebbleNotification, this should be just a wrapped error, no need for errors/New+errors.Join:

        return errors.Join(errors.New("couldn't execute a pebble notify: "), err)

Should be just fmt.Errorf with %w:

        return fmt.Errorf("couldn't execute pebble notify: %w", err)

I see you've done that elsewhere though.

internal/api/server_test.go

Minor: It's unusual to use TestMain in this way. Is it actually used for anything in this package? It looks like you're not actually using these files, and you could probably remove TestMain altogether (it's usually not needed at all).

If anything, it's much more common to have explicit setup helper functions, using t.Helper() and t.Cleanup(), and call them at the start of each test that needs them. For example:

func TestThing(t *testing.T) {
    path := setupStuff(t)
    // ...
}

func setupStuff(t *testing.T) string {
    t.Helper()
    t.Cleanup(func() {
        // do any cleanup
    })
    return "/some/path/or/whatever"
}

internal/api/response.go

Major: It's usually better for API clients that expect JSON (and I think all of these API endpoints return JSON) to get JSON in the error case as well. You could do this in your logErrorAndWriteResponse function:

func logErrorAndWriteResponse(msg string, status int, w http.ResponseWriter) {

The name of that is also a bit of a mouthful. Perhaps just writeError (logging is an internal detail). I'd also change the signature so you can use fmt.Sprintf and formatted string, allowing you to include an error with %v easily if needed. It's most common for these functions to take w as the first arg. So maybe something like:

func writeError(w http.ResponseWriter, status int, format string, args ...any) {
    type errorResponse struct {
        Error string `json:"error"`
    }

    errorMessage := fmt.Sprintf(format, args...)
    log.Println(errorMessage)

    resp := errorResponse{Error: errorMessage}
    respBytes, err := json.Marshal(&resp)
    if err != nil {
        // log and write a ricsgeneric error and return
    }
    w.WriteHeader(status)
    _, err = w.Write(respBytes)
    if err != nil {
        // log, can't do much else
    }
}

internal/api/middleware.go

Minor: C_STYLE_CONSTS isn't usually a thing in Go. USER_ACCOUNT can be UserAccount or perhaps UserPermission.

Minor: responseWriterCloner should probably have an Unwrap() http.ResponseWriter method. Read more about response controllers.

Minor: It's a bit odd that all the middleware funcs take a generic ctx *middlewareContext, instead of just the values they need (for example metrics middleware taking the metrics type).

Minor: responseWriterCloner is an odd name for this. What about statusRecorder?

Major: Normally you'd only apply middleware to the handlers that need them. It's messy when you apply them to everything, because then you have to have ad-hoc checks inside the middleware (which should be agnostic to what the route is) like this:

            if !strings.HasPrefix(r.URL.Path, "/_next") {
...
            if !strings.HasPrefix(r.URL.Path, "/api/v1/") {

If you apply middleware at the route level (or to a whole group of routes), you can avoid this ad-hoc mathching code in the middleware itself. For example:

    apiMiddleware := createMiddlewareStack(
        authMiddleware(&ctx),
        metricsMiddleware(&ctx),
        loggingMiddleware(&ctx),
    )
    router.Handle("/api/v1/", http.StripPrefix("/api/v1", apiMiddleware(apiV1Router)))

    // For statics, only use metrics middleware
    router.Handle("/", metricsMiddleware(frontendHandler))

You should be able to do something similar with the authMiddleware, pass in a "allowFirstAccount" or something, and only wrap the routes you need, to avoid this ad-hoc filtering in the middleware itself:

            if r.Method == "POST" && strings.HasSuffix(r.URL.Path, "accounts") && !ctx.firstAccountIssued {

Similarly, this looks a bit problematic in authMiddleware -- because it means AllowRequest has a list of hard-coded routes (done with regex matching):

            if claims.Permissions == USER_ACCOUNT {
                requestAllowed, err := AllowRequest(claims, r.Method, r.URL.Path)

It would be much cleaner to have more specific auth middleware like adminOnly and adminOrUser, and wrap only the applicable routes in it, for example:

    apiV1Router.HandleFunc("GET /accounts", adminOnly(GetUserAccounts(env)))
    apiV1Router.HandleFunc("POST /accounts", adminOnly(PostUserAccount(env)))
    apiV1Router.HandleFunc("DELETE /accounts/{id}", adminOnly(DeleteUserAccount(env)))

    apiV1Router.HandleFunc("GET /accounts/{id}", adminOrUser(GetUserAccount(env)))
    apiV1Router.HandleFunc("POST /accounts/{id}/change_password", adminOrUser(ChangeUserAccountPassword(env)))

This would allow you to avoid the kind of re-routing code in AllowRequest entirely. You might need to use Request.PathValue("id") in adminOrUser to get the user ID from the matched URL.

Major: Unless I'm misunderstanding, I think this might be a bug (though it'd likely cause a panic in the caller rather than be a security bug). In getClaimsFromJWT, if err is nil but the token is invalid, this will return nil, nil and the caller will assume there's no error:

    if err != nil || !token.Valid {
        return nil, err
    }

It should probably be more like:

    if err != nil {
        return nil, err
    }
    if !token.Valid {
        return nil, errors.New("invalid token")
    }

internal/api/authorization_test.go

Minor: it'd be cleaner if this special case was a test struct field, passwordMatch string or something, instead of a special case that compares the description. (What if someone changes the description and forgets to update this?)

            if tC.desc == "Create no password user success" {

Or just pull this out as a separate test function. Not everything needs to be a table-driven test if it's unwieldy.

internal/api/handlers_certificate_requests.go

Major:: Generally it's best not to include arbitrary err.Error() information in the API responses, in case something mildly sensitive is leaked. For example:

            logErrorAndWriteResponse(err.Error(), http.StatusInternalServerError, w)

For internal errors, I would log the full error, but just include something generic like "internal error" in the API response. Also make sure that you have good metrics/alerts on this in production, so you can easily find such errors (which generally "shouldn't happen").

I also find it helpful to make helpers for specific error handling that happens, for example, instead of the above, make a helper you just call as internalError(w, err) or badRequest(w, err).

Major: You have this "marshal and then write to response" pattern a lot:

        body, err := json.Marshal(certs)
        if err != nil {
            logErrorAndWriteResponse(err.Error(), http.StatusInternalServerError, w)
            return
        }
        if _, err := w.Write(body); err != nil {
            logErrorAndWriteResponse(err.Error(), http.StatusInternalServerError, w)
        }

It'd be good to extract this to a helper, so you can do this without so much boilerplate:

    err := writeJSON(w, certs)
    if err != nil {
        internalError(w, err)
        return
    }

Major: API design point. It's usually best to have a "result container" in your API responses, so you can add top-level metadata like "error" or "pagination-cursor" or whatever later. So writeJSON could wrap everything in a struct something like so:

type response struct {
    Result any    `json:"result,omitempty"`
}

The writeJSON function could take care of this wrapping for you, and you could have a writeJSONError too that used this shape:

type response struct {
    Error  string `json:"error"`
}

Major: Testing error strings is really not great, unless you obsoluately have to:

        id, err := env.DB.CreateCSR(string(csr))
        if err != nil {
            if strings.Contains(err.Error(), "UNIQUE constraint failed") {
                logErrorAndWriteResponse("given csr already recorded", http.StatusBadRequest, w)
                return
            }
            if strings.Contains(err.Error(), "csr validation failed") {
                logErrorAndWriteResponse(err.Error(), http.StatusBadRequest, w)
                return
            }
            logErrorAndWriteResponse(err.Error(), http.StatusInternalServerError, w)
            return
        }

SQLite error message are probably quite stable (but who knows), but a maintainer might modify the "csv validation failed" error to say "CSV validation failed", and then this check would silently fail.

The mattn/go-sqlite3 library actually has good error codes for things like this. You'd check with:

if errors.Is(err, sqlite3.ErrConstraintUnique) {
    ...
}

However, it's probably best to not let lower-level sqlite3 errors leak through the database package to the api layer. You probably want to do that check in the db package and convert it to a custom error type like db.ErrAlreadyExists, defined at the package level like so:

var ErrAlreadyExists = errors.New("already exists")

Oh, I see you already have an ErrIdNotFound -- yeah, use exactly that same pattern here -- you want to avoid error string matching like the plague.

For the "csr validation error", you can do the same thing:

    // in db.go:
    var ErrCSRValidationFailed = errors.New("csr validation failed")
    // ...
    if err := ValidateCertificateRequest(csr); err != nil {
        return 0, ErrCSRValidationFailed
    }

    // in handler:
    if errors.Is(err, db.ErrCSRValidationFailed) {
        logErrorAndWriteResponse(err.Error(), http.StatusBadRequest, w)
        return
    }

If you're finding all the database error checking is basically the same ... in one of my projects I have a helper handleDatabaseError that does all the different checks and returns the correct error (bad request, internal error, etc). Something like so:

// handleDatabaseError checks for specific database errors and writes the appropriate response,
// returning true if it is an error, otherwise false.
func handleDatabaseError(w http.ResponseWriter, err error) bool {
    if errors.Is(err, db.NotFound) {
        badRequest(w, err)
        return true
    }
    if err != nil {
        internalError(w, err)
        return true
    }
    return false
}

// use like so:
err := db.SomeDBOperation()
if handleDatabaseError(w, err) {
    return
}

Which avoids a lot of boilerplate after database queries.

Major: As a consumer of the API, it's weird that some things return JSON, but PostCertificateRequest returns a raw integer:

        if _, err := w.Write([]byte(strconv.FormatInt(id, 10))); err != nil {
            logErrorAndWriteResponse(err.Error(), http.StatusInternalServerError, w)
        }

In this case strconv.FormatInt(id, 10) happens to be in JSON number format, but that's kind of by coincidence.

I think all responses should be wrapped in JSON. It's usually best to have a standard "shape", for example {"result": ...} -- see my comments about writeJSON above, and then you could use just writeJSON(w, id).

Minor: I know it's only a pseudo-domain, but you should probably use a "domain" that we own in the Pebble notify key:

err := SendPebbleNotification("notary.com/certificate/update", insertIdStr)

Maybe:

err := SendPebbleNotification("canonical.com/notary/certificate/update", insertIdStr)

Minor: Speaking of pebble notify (it's cool that you're using it BTW). If conf.PebbleNotificationsEnabled, you should probably check at startup that exec.LookPath can find the pebble binary, so it doesn't fail much later at runtime and you potentially miss the failures. It seems to me if that's turned on it should be a hard failure.

Major: It looks like it sends a "certificate/update" pebble notify even for the reject and delete cases. Shouldn't these be a different notice key?

internal/api/handlers_certificate_requests_test.go

Minor: For things like this:

expectedGetAllCertsResponseBody1 = fmt.Sprintf("[{\"id\":1,\"csr\":\"%s\",\"certificate\":\"\"}]", trimmed(AppleCSR))

I would use a little helper (you can run helpers at the package-level for variable initialisation) that actually used json.Marshal with a little struct type. It would make this safer/cleaner (what if some of the strings needs JSON-escaping?).

internal/api/handlers_health.go

Minor: In HealthCheck, you have to write the headers/status before the content, so swap these two lines:

        w.Write(response)            //nolint:errcheck
        w.WriteHeader(http.StatusOK) //nolint:errcheck

It also wouldn't hurt to log the errors and avoid the "nolint" comments.

internal/api/handlers_login.go:

Minor: This literal is a bit hidden away in the code:

            ExpiresAt: time.Now().Add(time.Hour * 1).Unix(),

Might be nice to make a package-level named const.

internal/api/handlers_login_test.go

Minor: Once again, better to do this kind of thing with a struct field in the test case, rather than ad-hoc string matching on the test description as a special case:

            if tC.desc == "Login success" && res.StatusCode == http.StatusOK {

Or just pull this out as a separate test function. Not everything needs to be a table-driven test if it's unwieldy.

benhoyt commented 1 day ago

Okay, so I did the second half today after all. A few more "Major" points here, and some strong recommendations about the db package. Once again, I'm happy to chat -- might be good to have a voice call in any case to go over some of this. Just put something on my calendar at a reasonable time.

internal/api/handlers_users.go

Major: For discussion: are we sure we want to have the server generate a password? There's a fair number of tricks and traps doing this. I personally dislike (and NIST guidelines support me) when there are password composition rules like "must have an uppercase letter, blah blah blah". For people who use good password generators, it's a pain, and for people who don't, they're just going to add "1!" to the end of their password. NIST recommends not having "composition rules", and checking against password dictionaries instead. But in general it's not an easy problem.

In addition, it looks like you're using crypto/rand to generate the characters, but then shuffling it using math/rand.Shuffle. This is almost certainly bad for security, and I suspect (but don't know) undoes some of the "true randomness" from crypto/rand. If you're generating anyway, I strongly recommend avoiding math/rand and just calling getRandomChars(allCharsSet, 16) -- simpler, more entropy, and doesn't need math/rand.

But I'd recommend avoiding having the server generate password altogether. People that want a weak memorable password can still easily use "Password1" if they want, according to validatePassword. :-)

Minor: This is a bit yucky, in GetUserAccounts:

        for i := range users {
            users[i].Password = ""
        }

I'd recommend changing RetrieveAllUsers to not fetch the (hashed) password in the first place. It looks like none of the callers use it anyway. See also my recommendations about not using SELECT * in the db notes.

Similar in GetUserAccount, we fetch the (hashed) password field only to throw it away:

        userAccount.Password = ""

I recommend not fetching this at all, except in the one place (authorization) where you actually need it, and have a separate db method for that, for example db.RetrieveUserWithHashedPassword, so it's very explicit about when you need it.

Major: (possibly Catastrophic, I'm not sure). In this code block:

        if id == "me" {
            claims, headerErr := getClaimsFromAuthorizationHeader(r.Header.Get("Authorization"), env.JWTSecret)
            if headerErr != nil {
                logErrorAndWriteResponse(headerErr.Error(), http.StatusUnauthorized, w)
            }
            userAccount, err = env.DB.RetrieveUserByUsername(claims.Username)

You're falling through in case of headerErr != nil, meaning it will go ahead and RetrieveUserByUsername with whatever is in claims.Username even if claim validation fails. This is probably only going to lead to a nil pointer panic, but it definitely needs a return here.

Also, I'd just use the standard err naming instead of headerErr -- you're going to override it below anyway, and keeps things consistent.

Major: It's a bit odd, and perhaps not the greatest from a security perspective, that you fetch all users' full details only to check if there are any users or not:

        users, err := env.DB.RetrieveAllUsers()
        if err != nil { ... }
        permission := "0"
        if len(users) == 0 {
            permission = "1" // if this is the first user it will be admin
        }

I'd recommend having an explicit db.NumUsers() call or similar.

Major: Looking at the above, I'd strongly recommend having named constants for the permissions values. Otherwise you'll accidentally forget which magic constant is which in some new code and get them the wrong way around.

Somewhat related, this is more of a "role" (admin vs normal user) than "permissions" (can they access feature X or Y). Might be good to use that terminology.

Major: There's also a race condition here: I realise it'd be unlikely in practice, but if it happens or if someone knew the weakness it'd be pretty bad, as they could create an admin user. Consider this: two users hit PostUserAccount at roughly the same time, both execute RetrieveAllUsers and it returns 0 for both (because it's before either one executed CreateUser, then both get permission 1 (admin), both the good user and the nefarious one who did it at the same time.

I would recommend finding a different way to allow setting up the first user. One way (probably not the best) would be to check db.NumUsers() again at the end and ensure it's 1. If it's 2, you know the race occurred and you can fail.

Another way would be a dedicated is_first column with a unique constraint, so it would only allow one initial admin user. Seems a bit heavy to have a whole column for that, but on the other hand, doing a constraint like that at the database level seems better.

Minor: In ChangeUserAccountPassword, should they have to provide the existing password to change it, as a safeguard?

internal/api/middleware.go

Major: I was looking at authMiddleware again and saw this:

if claims.Permissions == USER_ACCOUNT {
    ...
}
next.ServeHTTP(w, r)

But what if claims.Permissions is something other than USER_ACCOUNT? I realise there's only one other value now (ADMIN_ACCOUNT), but if GUEST_ACCOUNT was added later, it would also have full/admin access. Seems like a security accident waiting to happen.

I'd strongly recommend an exhaustic switch with a default case that gives an forbidden error.

Overall I think there should be a thorough security review from someone with a twisted mind to try to poke holes in the auth. I've learned by experience it's easy to get wrong.

Minor: Speaking of security, in AllowRequest there's this:

        if err != nil {
            return true, fmt.Errorf("error converting url id to string: %s", err)
        }

I think that should be return false, ...? Probably AllowRequest should just return a bool and false if there's any error, to simplify.

internal/config/config.go

Minor: Why does ConfigYAML.Pebblenotificationsenabled have funny casing, instead of matching Config.PebbleNotificationsEnabled. FWIW, I'd probably shorten this to PebbleNotifications or NotifyPebble (it's a bool, so "enabled" seems obvious).

Minor: It's weird that Validate error handling uses errors.Join when there's actually only one error, just to concatenate the strings. Actually, it looks like the string version of an errors.Joined error puts newlines between the errors, so it's not really what you want when formatted either. Just use:

return Config{}, fmt.Errorf("config file validation failed: %w", err)

Could make the format string a function-scoped const if you want.

Actually, I'd recommend not adding that context at all -- the context is usually added by the calling function (it's the same for all error paths). The context you want to add here is what operation is being performed, like fmt.Errorf("cannot read config file: %w, err) or fmt.Errorf("cannot create database file: %w", err).

Minor: Validate should return the zero value, Config{}, on any error, rather than the potentially half-filled config struct.

internal/config/config_test.go

Minor: Once again, I'd use test helpers instead of TestMain (which is very rare). However, in this case, instead of creating a temp directory, writing out a file, etc, I'd just use a testdata directory as mentioned elsewhere. And then reference testdata/valid.yaml as a relative path (go test changes to the package directory before running the test binary). That would avoid all the file writing/chdir/cleanup.

internal/metrics/metrics.go

Took a quick scan at this, and don't have any comments (but it's not really my area of expertise).

internal/metrics/metrics_test.go

Minor: In TestMetrics, you can use T.TempDir, which automatically creates a temporary directory and then point SQLite at filepath.Join(tempDir, "db.sqlite3") -- SQLite will automatically create the db file if it doesn't exist. T.TempDir cleans up at the end of the test, so no mucking about with defer os.Remove().

internal/db/db.go

I don't like ORMs, but I'd highly recommend using a slightly higher-level database library. As mentioned earlier, I really like sqlc, which takes your database schema and queries, and compiled them to the boilerplatey Go code you'd write by hand (it's still easy to read). The dev typically uses go generate and then commits these to the repo to make it easier for others on a git clone.

You can see what that looks like on the recent Canonical project commitment-tracker. Queries here and generated code here.

I've used it in a personal project as well, and it's really nice.

Part of the beauty of it is you can still type SELECT * in your original queries SQL, but it compiles down to explicit columns in the generated code (based on your schema). This solves the problems with SELECT * described below. It also avoids all the manual field scanning, like this:

    if err := row.Scan(&newUser.ID, &newUser.Username, &newUser.Password, &newUser.Permissions); err != nil {

You can easily add Go-based validation code in a non-generated .go file too, as needed.

Alternatively, you could use Canonical's recently-produced sqlair library. That takes a different approach using reflection rather than codegen, but avoids the SELECT * problem and avoids manual scanning of fields too.

Major: SELECT * is a problem in production. It usually works nicely for a while, until you're doing a JOIN and do a database migration and add a new column in one table that conflicts with a column name in the join -- then suddently the query becomes an error and takes down your production servers.

It's also inefficient in that it usually fetches more data than you need (consider the Password case mentioned earlier), though that's probably not an issue here.