Closed rsdoiel closed 1 year ago
The problem is somewhere along the evolution of the json package in Go some one decided it was sensible to default any HTML entities to their UTF-8 code point values rather than represent them as normal UTF-8 characters. This appears to be the vase for &, \n, <, > at least. The solution is to create your on JSON encoder using the NewEncoder func and set things sanely.
// MarshalObject provide a custom json encoder to solve a an issue with
// HTML entities getting converted to UTF-8 code points by json.Marshal()
// in recent versions of go (~= go1.21).
func MarshalObject(obj interface{}, prefix string, indent string) ([]byte, error) {
buf := []byte{}
w := bytes.NewBuffer(buf)
enc := json.NewEncoder(w)
enc.SetEscapeHTML(false)
enc.SetIndent(prefix, indent)
err := enc.Encode(obj)
if err != nil {
return nil, err
}
return w.Bytes(), err
}```
Replaces my use of json.MarshalIndent() and solves this weird problem.
Fixed in upcoming v1.0.6 release using Go 1.21.1.
The works API from CrossRef looks like it is outputing correct JSON but when I use crossrefapi cli to get the same data some of the strings have UTF-8 encodings in them. They appear to be <, > and possibly a few more. Need to investigage.