gofiber / fiber

āš”ļø Express inspired web framework written in Go
https://gofiber.io
MIT License
33.92k stars 1.67k forks source link

šŸž Param string data buffer is reused, causing unexpected side effects #426

Closed ViRb3 closed 4 years ago

ViRb3 commented 4 years ago

Fiber version/commit

v1.10.1

Issue description

The method fiber.Ctx.Params(key string) string returns strings that share the same data buffer. This means that once a new string is returned, all previously returned strings will be changed to the new string's content, no matter how much they have been copied and passed around, since their data buffer is the same. This behavior is clearly seen in PoC below. unsafe is used to parse the header of each string returned from the Params method and then its content is printed. Notice how the data pointer is identical for all strings, meaning that they all use the same data buffer.

The only workaround right now is to explicitly copy the data buffer into a new buffer and create a string from the new buffer:

newBuffer := make([]byte, len(result))
copy(newBuffer, result)
newResult := string(newBuffer)

This behavior is absolutely unexpected, as strings in Go are expected to be immutable. I have not tracked down exactly what causes this issue, but I suspect that it may be a side effect from an inherited optimization by fasthttp. If this is true, I suspect that there are other methods that behave the same way as well. This behavior should be changed, or a clear documentation notice should be placed that indicates this side effect.

PoC

package main

import (
    "fmt"
    "github.com/gofiber/fiber"
    "net/http"
    "reflect"
    "sync"
    "unsafe"
)

var results []string
var wg sync.WaitGroup

func main() {
    go server()
    client()
    wg.Wait()
}

func client() {
    wg.Add(3)
    http.Get("http://127.0.0.1:3000/aaa")
    http.Get("http://127.0.0.1:3000/bbbb")
    http.Get("http://127.0.0.1:3000/c")
}

func server() {
    app := fiber.New()
    app.Get("/:test", func(c *fiber.Ctx) {
        result := c.Params("test")
        hdr := (*reflect.StringHeader)(unsafe.Pointer(&result))
        results = append(results, result)
        fmt.Printf("Param string: %+v Value:%s\n", hdr, result)
        fmt.Printf("Results: %+v\n", results)
        wg.Done()
    })
    app.Listen(3000)
}

Output

Param string: &{Data:824633868529 Len:3} Value:aaa
Results: [aaa]
Param string: &{Data:824633868529 Len:4} Value:bbbb
Results: [bbb bbbb]
Param string: &{Data:824633868529 Len:1} Value:c
Results: [cbb cbbb c]
welcome[bot] commented 4 years ago

Thanks for opening your first issue here! šŸŽ‰ Be sure to follow the issue template! If you need help or want to chat with us, join us on Discord https://gofiber.io/discord

Fenny commented 4 years ago

We use the same method as Fasthttp to convert strings and bytes without memory allocation, but requires you to never change the original byte array, but that's a reasonable contract in many cases. To persist values after returning from the handler, you can use the build-in Immutable setting in Fiber.

Reference: https://github.com/gofiber/fiber/issues/185

app := fiber.New(&fiber.Settings{
    Immutable: true,
})

However, we could make an exception to allocate parameters when c.Params() is called and keep 0 allocations on matching / hot paths. @ReneWerner87 thoughts?

ViRb3 commented 4 years ago

Oh, I was not aware of the immutable setting. I wouldn't mind keeping the current behavior, but there should be clear documentation that states such side effects - perhaps a general warning in the start of the project, so people are aware that this is even a possible side effect, and maybe note down the scope of returned variables on each method's documentation, like fasthttp does it. It took me a full day to track down this issue just because I just didn't expect it to be coming from fiber.

Fenny commented 4 years ago

You are right regarding the documentation, we could add some sort of warning/disclaimer regarding this issue and explain how it would be solvable using the Immutable setting or allocate/copy specific variables by using the default string or []byte types.

If you have any suggestions on how we could phrase/summary this behavior, I will add it to all README's and API Docs right away.

Suggestion:

Returned value is valid until the handler returns, do not store references, make copies instead or use the Immutable setting to do this for you. Keep in mind that this may result in memory allocation+copy.

ViRb3 commented 4 years ago

My recommendation would be to make two changes.

First, add a dedicated section in the beginning of the docs that outlines this concept as a whole. I created a PR for that, mentioned above. Any feedback is very welcome.

Second, for each method that returns a non-immutable value, like the Params method discussed in this issue, add a warning to its docs, like the one you proposed. I allowed myself to re-phrase it to make it a bit shorter:

Returned value is only valid within the handler. Do not store any references. Make copies or use the Immutable setting instead.

Fenny commented 4 years ago

Awesome work @ViRb3, I merged your PR in the docs. I will add your short phrase to each non-immutable method this weekend.

Feel free to join us on Discord to have a chat!

gabzim commented 3 years ago

@Fenny when you say not to keep references to it, is it ok to pass it to the service layer if the handler will wait for the service to return before it returns? example:

app.Get("/users/:id", func(c *fiber.Ctx) {
        userID := c.Params("id")
        // is this ok?
        user, err := userService.FindByID(userID)
                // ... rest of the code
    })

Assuming above that userID is only used to make a query to the DB and not stored anywhere in state beyond that use.

ReneWerner87 commented 3 years ago

Yeah should be okay, the string is there until the response is written.

calbot commented 3 years ago

I just got hit by this issue. I would like to suggest, (if it works) to return a new type something like below unsafestring. This would make it explicit that the mutability MUST be considered when using the string. Go strings are always spoken of as immutable so it should probably be a new type. This way you could pass your unsafestring into functions and it would be clear that it requires special consideration. If you have fairy complicated program you could easily lose track of which strings are allowed to outlive the caller's scope.

Params, Query would return unsafestring instead of string

//Use the Copy method if you need it outside the scope
//of the fiber handler. It's ok to use UNSAFE to avoid a copy 
//if you know that that the result won't outlive the fiber handler

type unsafestring string

func (u unsafestring) Copy() string {
    return fiberUtils.ImmutableString(string(u))
}

func (u unsafestring) UNSAFE() string {
    return string(u)
}
DiggidyDave commented 5 months ago

Also just got hit by this. This is insidious. Not only was it causing values to change underneath us, it was also changing the values that we were logging to trace through the system and try to find the issue.

I appreciate what you're doing here, but this should be an opt-in setting IMHO. It would be much more discoverable as a performance feature that you can enable that can be called out in documentation for people looking to improve performance. I do see that it is in the docs now as suggested above, but I suspect very few people look up and read to the bottom the documentation for ctx.Get for example.

gaby commented 5 months ago

@DiggidyDave This is highly explained in the First Page of the documentation. See here https://docs.gofiber.io/#zero-allocation

It also has an example of how to turn off this feature for the full framework.

DiggidyDave commented 5 months ago

Sure, anyone who actually starts with the docs will probably see it. Others, like me, may start with the readme on github and will only consult docs to fill gaps in an otherwise very intuitive API. Also, someone may see it and brush it off as an edge case that doesn't apply to them at the time and skim past it, only to be bitten by it later. If you are banking on folks seeing the docs, why not have the riskiest part of that assumption be opt-in and super discoverable in the docs?

It was just a suggestion, no major complaints here. Love the project, just hit this and thought I'd mention it. :-)

gaby commented 5 months ago

@DiggidyDave Thanks for the suggestion, I will update the README to mention this