cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.9k stars 3.78k forks source link

geoindex: investigate slow performance #65483

Open otan opened 3 years ago

otan commented 3 years ago

Environment:

The code is already running on 64 different goroutines and cockroack is maxing out a 8 cores 64GB memory single instance server

Those data are generated by ogr2ogr from an openstreetmap extract of admin boundaries .

It can be queried with: SELECT name, ogc_fid FROM france WHERE ST_Contains(wkb_geometry,ST_Transform(ST_GeomFromText('POINT(2.2 48.8)', 4326), 4326));

Here is how to reproduce:

mv france.sql cockroach-data/extern
IMPORT PGDUMP ('nodelocal://0/france.sql');

There is a france dump here (sorry, private stuff only!): https://drive.google.com/file/d/182WCLzcGzb62Xrn7FefOuUJYYYJaMtGd/view?usp=sharing


Community slack thread: https://cockroachdb.slack.com/archives/C013ADU1FPC/p1620744151005700


Fabrice AkHeN May 12th at 12:42 AM
Hello I'm comparing cockroachdb and postgis with ST_Contains, cockroach is comparing poorly on a 40k polygons dataset:
cockroachdb rate mean 14/s
postgis rate mean 823/s
Any advises?
41 replies

Fabrice AkHeN  8 days ago
explain: https://cockroachdb.github.io/distsqlplan/decode.html#eJyclN1u4kYUx-_7FEfnBpCmMP4IIXMFaZ2WlkAKSG26R[…]WHvqqkosDnZu-g2T9s9g-agxfm2e6XrwEAAP__smPCvw==

Rebecca Taft (Cockroach Labs)  8 days ago
Hi @Fabrice AkHeN, this explain output makes it look like this query is performing well -- the spans are selective (resulting in only 36 rows), and the total time appears to be ~14 ms. Is there another query that shows the performance problems you're seeing?

Fabrice AkHeN  8 days ago
Hi @Rebecca Taft (Cockroach Labs), I agree the explain seems fine, but it's only "slow" compared to Postgis, with the same dataset on the same server, using the same Go program for the bench (pgx driver 64 conns, 64 go routine with ST_Contains).

Fabrice AkHeN  8 days ago
cdb count 1620 rate mean 136/s rate1 127/s
pgis count 29499 rate mean 3281/s rate1 3168/s

Fabrice AkHeN  8 days ago
I'm sure it can be tuned but the difference is so high I'm wondering if there is no other issue (edited) 

Rebecca Taft (Cockroach Labs)  8 days ago
what are those numbers exactly in your benchmark? Rows? transactions?

Rebecca Taft (Cockroach Labs)  8 days ago
I might be mistaken, but if you're just running a single query in a loop, I think generally CockroachDB won't use all of those threads for a single query -- generally it's optimized for throughput when many queries are run simultaneously. A single query will be able to get more parallelism if you use multiple nodes. I believe Postgres will be able to take more advantage of those threads.

Rebecca Taft (Cockroach Labs)  8 days ago
My suspicion is that this doesn't have much to do with the performance of ST_Contains, but rather just the scan speed of CockroachDB. You could confirm by running your benchmark with a simple scan query instead of ST_Contains. If that's the case, I'd suggest asking the same question again in the main #cockroachdb chat -- you'll probably get more info about the source of the performance differences from others in that chat.

Fabrice AkHeN  8 days ago
Queries per second, SELECT ogc_fid FROM france WHERE ST_Contains(wkb_geometry,ST_Transform(ST_GeomFromText('POINT(2.2 48.8)', 4326), 4326));

Fabrice AkHeN  8 days ago
The code is already running on 64 different goroutines and cockroack is maxing out a 8 cores 64GB memory single instance server

Rebecca Taft (Cockroach Labs)  8 days ago
I see -- I assume you've tried removing the index and comparing the performance? As I said, it looks like using the index is the right plan given that it's only scanning 36 rows, but it's probably work checking just in case.

Rebecca Taft (Cockroach Labs)  8 days ago
Also as I mentioned before, I think this performance difference may not be primarily due to ST_Contains, since the plan you showed looks good. I'd suggest trying a simple query without any spatial predicates (e.g., just a predicate with < and > comparisons) and see if you see a similar performance difference. In that case, people in the #cockroachdb channel will likely have more insight. I think a single-node cockroach cluster generally cannot compete well with Postgres on scan speed.

Fabrice AkHeN  8 days ago
a simple cockroachdb select
count 76109 rate mean 28522/s

Fabrice AkHeN  8 days ago
it seems that this particular kind of queries are way slower on cockroach than postgis, I agree that a single node will be slower but not in this order of magnitude I guess

Fabrice AkHeN  8 days ago
I'll continue to do some testing thanks Rebecca

Rebecca Taft (Cockroach Labs)  8 days ago
 a simple cockroachdb select
count 76109 rate mean 28522/s
How does this compare to Postgres?

Fabrice AkHeN  8 days ago
count 1094480 rate mean 102802/s

Fabrice AkHeN  8 days ago
only 4x times slower

Rebecca Taft (Cockroach Labs)  8 days ago
I see -- well you might get some useful ideas from the #cockroachdb channel, but also just wondering for the ST_Contains query if you are varying the predicate at all? If all queries are touching the same ranges contention could also help explain it

Fabrice AkHeN  8 days ago
it's taken into account, the test is testing randomly the location over the 40k polygons (edited) 

Fabrice AkHeN  8 days ago
(the dataset is also inserted into a free cloud instance airspacecockroach into tanks table france if you want to look at it)

Rebecca Taft (Cockroach Labs)  8 days ago
Thanks! I don't think I can access the instance though, for security reasons. Were you running the performance test on the free CRDB instance?

Fabrice AkHeN  8 days ago
no on a local instance

Fabrice AkHeN  8 days ago
here is statement diag, not sure it can help ...
Zip 
stmt-bundle-657602500652171265.zip
35 kB Zip35 kB — Click to download

Rebecca Taft (Cockroach Labs)  8 days ago
thanks -- for the non-spatial query you ran with a simple predicate-- was it using a covering index (an index including all the columns needed by the query)? I'm guessing that if you run with a non-covering index the performance (as compared to Postgres) will be worse. The JoinReader, which is used for the spatial query and also used in other cases with a non-covering index, could be a performance bottleneck.

Fabrice AkHeN  8 days ago
For the non-spatial query, it was just getting an id, so extremely simple, I'm not really worried about the speed difference between postgis and single node cockroach (we saw 4x diff).
But just concerned about the within query where the difference is huge and almost unusable.
It's backed by s2, I have another bench using direct s2 cells which outperform postgis by far.
I was expecting cockroach to perform very well on those queries, that's why I'm suspecting a problem.

Fabrice AkHeN  8 days ago
let me know if you want me to open an issue.

Rebecca Taft (Cockroach Labs)  8 days ago
Opening an issue would be helpful, thanks!

Oliver Tan (Cockroach Labs)  8 days ago
how big are these polygons?

Oliver Tan (Cockroach Labs)  8 days ago
if they're adequately big (> 100 points say), we have an optimisation in v21.1 that assists with that (edited) 

Fabrice AkHeN  7 days ago
Hi Oliver, yes some are > 100 pts, not sure how you are building the s2 index but I assumed an inside cover would be quite fast,
can you please point me to this optimisation, I'm using v20.2.7 ?

Oliver Tan (Cockroach Labs)  6 days ago
v21.1 hasn't been release yet, but https://github.com/cockroachdb/cockroach/pull/62643 / the beta 3 would be where to look

GitHubGitHub
geo/geomfn: add point in polygon optimization to st_contains, st_within by andyyang890 · Pull Request #62643 · cockroachdb/cockroach
This patch improves the performance of st_contains and st_within for the common use case of testing whether a (multi)polygon contains a (multi)point. Release note: None (68 kB)
https://github.com/cockroachdb/cockroach/pull/62643

Oliver Tan (Cockroach Labs)  6 days ago
i'm also curious, does using polygon && point (bounding box comparison) make it faster?

Fabrice AkHeN  6 days ago
v21.1 can't import the data:
build:               CCL v21.1.0-rc.2 @ 2021/05/10 18:22:38 (go1.15.11)
pg:root@pouf:26257=> IMPORT PGDUMP ('nodelocal://0/france.sql');
error: pq: XXUUU: unsupported *tree.SetVar statement: SET standard_conforming_strings = off
removed this line in the import did it (edited) 

Fabrice AkHeN  6 days ago
v21.1: rate mean 161/s
postgis: rate mean 2823/s

Fabrice AkHeN  6 days ago
it's a bit better but the difference are really high

Fabrice AkHeN  6 days ago
Those data are generated by ogr2ogr from an openstreetmap extract of admin boundaries .
It can be queried with:
SELECT name, ogc_fid FROM france WHERE ST_Contains(wkb_geometry,ST_Transform(ST_GeomFromText('POINT(2.2 48.8)', 4326), 4326));
Here is how to reproduce:
cockroach start-single-node --insecure --http-addr=0.0.0.0:9091  --spatial-libs /home/akh/cockroach/lib
mv france.sql cockroach-data/extern
IMPORT PGDUMP ('nodelocal://0/france.sql');
File is uploading
Binary 
france.sql.xz
42 MB Binary42 MB — Click to download

Fabrice AkHeN  6 days ago
Load testing code:
package main
import (
    "context"
    "flag"
    "fmt"
    "log"
    "math/rand"
    "os"
    "os/signal"
    "sync"
    "syscall"
    "time"
    "github.com/jackc/pgx/v4/pgxpool"
    "github.com/rcrowley/go-metrics"
)
// &pool_max_conns=64
var (
    dbURL  = flag.String("dbURL", "localhost", "database URL use with postgis index only")
    latMin = flag.Float64("latMin", 49.10, "Lat min")
    lngMin = flag.Float64("lngMin", -1.10, "Lng min")
    latMax = flag.Float64("latMax", 46.63, "Lat max")
    lngMax = flag.Float64("lngMax", 5.5, "Lng max")
)
func main() {
    flag.Parse()
    poolConfig, err := pgxpool.ParseConfig(*dbURL)
    if err != nil {
        log.Fatal(err)
    }
    pool, err := pgxpool.ConnectConfig(context.Background(), poolConfig)
    if err != nil {
        log.Fatal(err)
    }
    // catch termination
    interrupt := make(chan os.Signal, 1)
    signal.Notify(interrupt, syscall.SIGINT, syscall.SIGTERM)
    defer signal.Stop(interrupt)
    ctx, cancel := context.WithCancel(context.Background())
    var wg sync.WaitGroup
    const numJobs = 64
    tm := metrics.NewTimer()
    for w := 1; w <= numJobs; w++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for {
                t := time.Now()
                lat := *latMin + rand.Float64()*(*latMax-*latMin) // nolint: gosec
                lng := *lngMin + rand.Float64()*(*lngMax-*lngMin) // nolint: gosec
                err := Stab(ctx, pool, lat, lng)
                if err != nil {
                    fmt.Println(err)
                    break
                }
                tm.UpdateSince(t)
            }
        }()
    }
    select {
    case <-interrupt:
        cancel()
        break
    case <-ctx.Done():
        break
    }
    wg.Wait()
    msg := fmt.Sprintf("count %d rate mean %.0f/s rate1 %.0f/s 99p %.0f\n",
        tm.Count(), tm.RateMean(), tm.Rate1(), tm.Percentile(99.0))
    fmt.Println(msg)
}
func Stab(ctx context.Context, pool *pgxpool.Pool, lat, lng float64) error {
    ctx, cancel := context.WithTimeout(ctx, 4*time.Second)
    defer cancel()
    q := fmt.Sprintf(`SELECT ogc_fid FROM france
            WHERE ST_Contains(wkb_geometry,
                ST_Transform(ST_GeomFromText('POINT(%f %f)', 4326), 4326)
            )`, lng, lat)
    rows, err := pool.Query(ctx, q)
    if err != nil {
        return err
    }
    for rows.Next() {
        var ogcFID int
        if err := rows.Scan(&ogcFID); err != nil {
            return err
        }
    }
    return nil
}

Fabrice AkHeN  6 days ago
cockroack max out 8 cores, latency jumps to 700ms
Screen Shot 2021-05-13 at 08.59.45.png 
Screen Shot 2021-05-13 at 08.59.45.png

Rebecca Taft (Cockroach Labs)  6 days ago
Hi @Fabrice AkHeN -- thanks for all this info, but I doubt we'll have time to debug this in the next few days, since it looks like this will require a fair amount of investigation. It would be great if you could open a GitHub issue with all of this information.

Fabrice AkHeN  6 days ago
will do, thanks.

Jira issue: CRDB-7627

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!