apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.26k stars 1.03k forks source link

IPv6 URL Mis-handling #5223

Closed AstraLuma closed 2 months ago

AstraLuma commented 2 months ago

Description

It looks like there's a bug in URL manipulation with regard to IPv6 addresses.

A snippet from /_scheduler/jobs:

    {
      "database": "_replicator",
      "id": "3e9e019d74f393c8fb8319dd43abec45+continuous",
      "pid": null,
      "source": "http://[fdaa:a:d05:a7b:12:eb5a:cc23:2]:5984/_users/",
      "target": "http://localhost:5984/_users/",
      "user": null,
      "doc_id": "pup_fdaa:a:d05:a7b:12:eb5a:cc23:2__users",
      "info": null,
      "history": [
        {
          "timestamp": "2024-09-06T03:45:14Z",
          "type": "crashed",
          "reason": "{replication_auth_error,\n    {session_request_failed,\"http://fdaa:a:d05:a7b:12:eb5a:cc23:2/_session\",\n        \"admin\",\n        {url_parsing_failed,{error,invalid_uri}}}}"
        },

Note that the _session URL does not show the requisite [] around the IP address.

Steps to Reproduce

  1. Set up two CouchDB instances on an IPv6 network
  2. Configure them to replicate with each other, referring to the other by IP address

Expected Behaviour

Replication happens cleanly.

Your Environment

{
  "couchdb": "Welcome",
  "version": "3.3.3",
  "git_sha": "40afbcfc7",
  "uuid": "2ec4265a1ffc0f5d75390407dd3cd392",
  "features": [
    "access-ready",
    "partitioned",
    "pluggable-storage-engines",
    "reshard",
    "scheduler"
  ],
  "vendor": {
    "name": "The Apache Software Foundation"
  }
}

Additional Context

This was exposed while working on an automagic CouchDB on fly.io project.

AstraLuma commented 2 months ago

From the CouchDB log:

[notice] 2024-09-06T03:54:09.540505Z nonode@nohost <0.393.0> -------- couch_replicator_scheduler: Job {"3e9e019d74f393c8fb8319dd43abec45","+continuous"} started as <0.12000.0>
[error] 2024-09-06T03:54:11.461321Z nonode@nohost <0.12000.0> -------- couch_replicator_httpc: auth plugin initialization failed "http://[fdaa:a:d05:a7b:12:eb5a:cc23:2]:5984/_users/" {session_request_failed,"http://fdaa:a:d05:a7b:12:eb5a:cc23:2/_session","admin",{url_parsing_failed,{error,invalid_uri}}}
[error] 2024-09-06T03:54:11.461920Z nonode@nohost <0.12000.0> -------- throw:{replication_auth_error,session_request_failed,"http://fdaa:a:d05:a7b:12:eb5a:cc23:2/_session","admin",{url_parsing_failed,{error,invalid_uri}}}}: Replication 3e9e019d74f393c8fb8319dd43abec45+continuous failed to start "http://[fdaa:a:d05:a7b:12:eb5a:cc23:2]:5984/_users/" -> "http://localhost:5984/_users/" doc <<"shards/00000000-7fffffff/_replicator.1725575687">>:<<"pup_fdaa:a:d05:a7b:12:eb5a:cc23:2__users">> stack:[{couch_replicator_httpc,setup,1,[{file,"src/couch_replicator_httpc.erl"},{line,62}]},{couch_replicator_api_wrap,db_open,3,[{file,"src/couch_replicator_api_wrap.erl"},{line,68}]}]
rnewson commented 2 months ago

what version of erlang do you have there?

rnewson commented 2 months ago

on erlang 25/26 this works;

1> ibrowse_lib:parse_url("http://[fdaa:a:d05:a7b:12:eb5a:cc23:2]:5984/_users/").
{url,"http://[fdaa:a:d05:a7b:12:eb5a:cc23:2]:5984/_users/",
     "fdaa:a:d05:a7b:12:eb5a:cc23:2",5984,undefined,undefined,
     "/_users/",http,ipv6_address}
AstraLuma commented 2 months ago

Whatever you're packing in your Debian packages. I'm having trouble tracking that down.

I'm using a fork of the semi-official container https://github.com/teahouse-hosting/couchdb-docker/blob/trunk/3.3.3/Dockerfile that's been updated to bookworm. (submitted as https://github.com/apache/couchdb-docker/pull/259) And that uses the Apache jFrog repos.

Also, I suspect the bug isn't in parsing http://[fdaa:a:d05:a7b:12:eb5a:cc23:2]:5984/_users/, but when that's transformed into http://fdaa:a:d05:a7b:12:eb5a:cc23:2/_session as part of the replicator. (It looses both the [] and :5984.)

AstraLuma commented 2 months ago

https://github.com/apache/couchdb/blob/c4aac977d2e6eafc34fcad72b624fc02cf07d49f/src/couch_replicator/src/couch_replicator_auth_session.erl#L267-L282

You're using string concatenation instead of URL unparsing, so when ibrowse_lib:parse_url() drops the [], it messes everything up.

rnewson commented 2 months ago

nice find, we'll get it fixed.

AstraLuma commented 2 months ago

Since I don't really know the Couch devs, I feel the need to point out that since the root of this bug is an API that's missing half its round-trip, this is probably a systemic bug in the Couch codebase--any part of the code that assembles or munges URLs is probably affected.

rnewson commented 2 months ago

I'm certainly alive to the notion that couchdb has more bugs around this than the one you've highlighted. :)

rnewson commented 2 months ago

extra context: the replicator application is the only thing that makes outbound http connections and has only recently needed to construct a url (to the _session endpoint to acquire a cookie) from another url supplied by the user (either the source or target). so we are looking for other occurrences but there likely aren't any.