dmwm / dbs2go

DBS server written in Go
MIT License
5 stars 4 forks source link

Test racing conditions in bulkblocks API calls #30

Open vkuznet opened 2 years ago

vkuznet commented 2 years ago

We should come up with integration tests which will allow to test racing conditions of bulkblocks API. They happen when there are competing (concurrent) calls with bulkblocks API which provides almost identical data (the data only different at block/file level and all other parameters remain the same). In this scenario there are common data such as physics group, dataset access type, processed dataset where we either need to insert or obtain IDs, see https://github.com/dmwm/dbs2go/blob/master/dbs/bulkblocks2.go#L506-L571

I identified that for small tables, like physics group or data tiers, the probability of racing conditions is kind of small, while for larger ones, like processed datasets which contains 149970 entries in ORACLE DB, there is a possibility of racing conditions which may happen if two competing HTTP requests tries to insert/check processed dataset ID.

We need to come up with integration test for this use-case.

vkuznet commented 2 years ago

Dennis, we should not experience lock of db if we properly use transactions. My suggestion is to investigate how sqlite should be initialized that transactions should be applied to tables rather DB itself.

I think we need to triple check that all statements, rows, are closed in bulkblocks API. It would be useful to identify on which part of bulkblocks API the lock occurred, i.e. does the lock appear on specific step, if so which step it is.

Finally, according to this github issue [1] we may try the following options:

db.SetMaxOpenConns(1)

and journal_mode=WAL which seems to address the issue with database lock.

[1] https://github.com/mattn/go-sqlite3/issues/569

d-ylee commented 2 years ago

@vkuznet I implemented the suggestions from the mattn/go-sqlite3 discussions and it seemed to resolve the database locking. I pushed my branch with the current work I have done to try to cause the racing conditions to happen in the bulkblocks API. In my logs, I see this:

[2022-10-05 14:25:16.405983633 +0000 UTC m=+3.899250753] HTTP/1.1 400 POST /dbs-one-writer/bulkblocks [data: 26088 in 1852 out] [remoteAddr: 127.0.0.1:59288] [X-Forwarded-For: ] [X-Forwarded-Host: ] [auth: no-TLS cipher-none "no-auth-cert" no-auth-login no-auth-method] [ref: "-" "Go-http-client/1.1"] [req: 1.1505ms proxy-resp: 0]
[2022-10-05 14:25:16.508029675 +0000 UTC m=+4.001296794] DBSError Code:101 Description:DBS DB error Function:dbs.bulkblocks.InsertBulkBlocksConcurrently Message:Data already exist in DBS Error: Block /unittest_web_primary_ds_name_8268_stepchain/acq_era_8268-v8268/GEN-SIM-RAW#8268 already exists Stacktrace: 
goroutine 617 [running]:
github.com/dmwm/dbs2go/dbs.Error({0xc20780?, 0x400055d520?}, 0x65, {0xb31e2c, 0x19}, {0xb424ee, 0x2b})
        /workspaces/dbs2go_worktrees/dbs2go_race/dbs/errors.go:172 +0x84
github.com/dmwm/dbs2go/dbs.(*API).InsertBulkBlocksConcurrently(0x40001ee000)
        /workspaces/dbs2go_worktrees/dbs2go_race/dbs/bulkblocks2.go:673 +0xccc
github.com/dmwm/dbs2go/web.DBSPostHandler({0xc24910, 0x400000e0d8}, 0x4000612300, {0xb251f6, 0xa})
        /workspaces/dbs2go_worktrees/dbs2go_race/web/handlers.go:542 +0x10ac
github.com/dmwm/dbs2go/web.BulkBlocksHandler({0xc24910?, 0x400000e0d8?}, 0x0?)
        /workspaces/dbs2go_worktrees/dbs2go_race/web/handlers.go:957 +0x54
net/http.HandlerFunc.ServeHTTP(0x10?, {0xc24910?, 0x400000e0d8?}, 0xab00fff340a04d4c?)
        /usr/local/go/src/net/http/server.go:2109 +0x38
github.com/dmwm/dbs2go/web.limitMiddleware.func1({0xc24910?, 0x400000e0d8?}, 0x400055cf50?)
        /workspaces/dbs2go_worktrees/dbs2go_race/web/middlewares.go:111 +0x40
net/http.HandlerFunc.ServeHTTP(0xa58d00?, {0xc24910?, 0x400000e0d8

This states that the block already exists in the database. Is this the type of racing conditions we are trying to resolve?

vkuznet commented 2 years ago

I'm glad that we resolve database lock issue and now can move forward. The error you got is not a racing condition. It clearly states that you try to insert a block which is already in DB.

The racing condition should happen as following:

The question is how to simulate it. I think you need to create multiple JSONs with different blocks but the same common data, like dataset configuration, and then inject them concurrently. The more HTTP requests you'll send concurrently the likely you will hit the racing condition.