bitmagnet-io / bitmagnet

A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration.
https://bitmagnet.io/
MIT License
2.07k stars 80 forks source link

error persisting torrents: ERROR: invalid byte sequence for encoding "UTF8": 0x00 (SQLSTATE 22021) #127

Closed davispuh closed 4 months ago

davispuh commented 4 months ago

Have you checked the roadmap on the website, and existing issues, before opening a dupllcate issue? Yes

Describe the bug

Seeing this in logs

ERROR        gorm        gorm/logger.go:72        gorm trace        {"location": "github.com/bitmagnet-io/bitmagnet/internal/database/dao/torrents.gen.go:691", "error": "ERROR: invalid byte sequence for encoding \"UTF8\": 0x00 (SQLSTATE 22021)", "elapsed": 85.291657, "sql": "INSERT INTO \"torrents\" (\"info_hash\",\"name\",\"size\",\"private\",\"piece_length\",\"pieces\",\"created_at\",\"updated_at\",\"files_status\") VALUES [...] ON CONFLICT (\"info_hash\") DO UPDATE SET \"name\"=\"excluded\".\"name\",\"files_status\"=\"excluded\".\"files_status\",\"piece_length\"=\"excluded\".\"piece_length\",\"pieces\"=\"excluded\".\"pieces\"", "rows": 20}
github.com/bitmagnet-io/bitmagnet/internal/database/gorm.(*customLogger).Trace
        github.com/bitmagnet-io/bitmagnet/internal/database/gorm/logger.go:72
gorm.io/gorm.(*processor).Execute
        gorm.io/gorm@v1.25.5/callbacks.go:134
gorm.io/gorm.(*DB).CreateInBatches.func1
        gorm.io/gorm@v1.25.5/finisher_api.go:48
gorm.io/gorm.(*DB).Transaction
        gorm.io/gorm@v1.25.5/finisher_api.go:647
gorm.io/gorm.(*DB).CreateInBatches
        gorm.io/gorm@v1.25.5/finisher_api.go:60
gorm.io/gen.(*DO).CreateInBatches
        gorm.io/gen@v0.3.25/do.go:598
github.com/bitmagnet-io/bitmagnet/internal/database/dao.torrentDo.CreateInBatches
        github.com/bitmagnet-io/bitmagnet/internal/database/dao/torrents.gen.go:691
github.com/bitmagnet-io/bitmagnet/internal/dhtcrawler.(*crawler).runPersistTorrents
        github.com/bitmagnet-io/bitmagnet/internal/dhtcrawler/persist.go:43
ERROR        dht_crawler        dhtcrawler/persist.go:44        error persisting torrents: ERROR: invalid byte sequence for encoding "UTF8": 0x00 (SQLSTATE 22021)

To Reproduce

No idea

Expected behavior

No errors :)

General (please complete the following information):

sysr-q commented 4 months ago

This looks like a duplicate of #86. One of these Torrent model columns: InfoHash, Name, FilesStatus, PieceLength, or Pieces contains a null byte (\x00) which PostgreSQL doesn't like.

I guess dhtcrawler/persist.go just needs to filter those before being sending to PG, or not set a column if it's just getting back only nulls from DHT. 🤔 Might be tough to reproduce if it's dependent on a random "broken" torrent in the DHT with nulls in its fields.

mgdigital commented 4 months ago

There was already an attempt at fixing this that was at least partially successful (see https://github.com/bitmagnet-io/bitmagnet/blob/main/internal/protocol/metainfo/banning/utf8.go).

I was thinking a particular sequence of characters might be triggering an escaping bug in Gorm but I wouldn't rule out a bug here either - needs more digging!

mgdigital commented 4 months ago

I think this should be fixed with the latest release but let's keep an eye on it and re-open if it's spotted again: https://github.com/bitmagnet-io/bitmagnet/releases/tag/v0.7.1