caltechlibrary / crossrefapi

This is a Go package fork working politely with the CrossRef API.
https://caltechlibrary.github.io/crossrefapi
Other
5 stars 0 forks source link

Works returning oddly UTF-8 code point for some characters #1

Closed rsdoiel closed 1 year ago

rsdoiel commented 1 year ago

The works API from CrossRef looks like it is outputing correct JSON but when I use crossrefapi cli to get the same data some of the strings have UTF-8 encodings in them. They appear to be <, > and possibly a few more. Need to investigage.

rsdoiel commented 1 year ago

The problem is somewhere along the evolution of the json package in Go some one decided it was sensible to default any HTML entities to their UTF-8 code point values rather than represent them as normal UTF-8 characters. This appears to be the vase for &, \n, <, > at least. The solution is to create your on JSON encoder using the NewEncoder func and set things sanely.


// MarshalObject provide a custom json encoder to solve a an issue with
// HTML entities getting converted to UTF-8 code points by json.Marshal()
// in recent versions of go (~= go1.21).
func MarshalObject(obj interface{}, prefix string, indent string) ([]byte, error) {
        buf := []byte{}
        w := bytes.NewBuffer(buf)
        enc := json.NewEncoder(w)
        enc.SetEscapeHTML(false)
        enc.SetIndent(prefix, indent)
        err := enc.Encode(obj)
        if err != nil {
                return nil, err
        }
        return w.Bytes(), err
}```

Replaces my use of json.MarshalIndent() and solves this weird problem.

Fixed in upcoming v1.0.6 release using Go 1.21.1.