thomasloubejac commented 5 years ago

During randomly generate session :

It seems that the database performance decrease over time.
And start converging around 100 txn/s

During tuning session (celery worker shows recommendation)

The database performance rarely goes over 100 txn/s

So I'm wondering if I did anything wrong ? I had 500 random configurations tested before going to tuning session.

Am I supposed to reload the database ? If so, how often ?

thomasloubejac commented 5 years ago

I let it run all night, it dropped to 65 txn/s over a 1000 tuning session loops.

nabti commented 5 years ago

Hi @thomasloubejac, Sorry this is not to answer your question I was just wondering if you can Explain how we can use it. step by step.

Thank you for your help

thomasloubejac commented 5 years ago

What I will give you here worked for me on postgres-9.6, you'll have to adapt on other dbms

1) clone this repo on your client and server machines

On your client Machine

2) clone oltpbenchmark repo on your client machine make sure you installed every packages needed apt-get install git openjdk-8-jdk-headless gradle fabric

3) Make sure that you set an account on your client's database for ottertune, set it with a password and grant it with the right to create databases Also make sure you can connect to that account directly from the user you will be running the program with. (I had to add -h localhost to every psql command in the ottertune/client/driver/fabfile.py file) In order to connect this is the command I have to use : psql -U user -d database -h localhost -W

4) edit configuration files

ottertune/client/driver/driver_config.json
ottertune/client/controller/config/{whatever suits your case} (this path has to appear in controller_config in driver_config.json) for example I have : "database_url" : "jdbc:postgresql://localhost:5432/dbtest" dbtest being the database I will run the benches on
oltpbench/config/{whatever benchmark you want to run} (this path has to appear in oltpbench_config in driver_config.json) with the right information (user and password always refer to the account you made on your dbms and never to your current user and passwd)

see a future post down below if you want further information on the configuration files.

5) Build the things /{path to oltpbench}/oltpbench/$ ant to build the benchmark (note that you don't have to rebuild it if you only modify the xml-configurations) /{path to ottertune}/ottertune/client/controller/$ gradle build to make sure the controller works ok.

6) at /{path to ottertune}/ottertune/client/driver/ run

fab create_database (you don't want to run benches on a non-existing database... I did it so don't bother trying
fab load_oltpbench (same story lol)

the last command should take no more than 3 minutes. the output should look like this :

ottertune_dev/ottertune/client/driver$ fab create_database && fab load_oltpbench
[localhost] Executing task 'create_database'
[localhost] local: PGPASSWORD=test123 createdb -e dbtest -U djamel -h localhost
SELECT pg_catalog.set_config('search_path', '', false)
CREATE DATABASE dbtest;

Done.
[localhost] Executing task 'load_oltpbench'
[localhost] local: ./oltpbenchmark -b tpcc -c /ottertune_dev/oltpbench/config/tpcc_config_postgres.xml --create=true --load=true
07:53:42,865 (DBWorkload.java:262) INFO  - ======================================================================

Benchmark:     TPCC {com.oltpbenchmark.benchmarks.tpcc.TPCCBenchmark}
Configuration: /ottertune_dev/oltpbench/config/tpcc_config_postgres.xml
Type:          POSTGRES
Driver:        org.postgresql.Driver
URL:           jdbc:postgresql://localhost:5432/dbtest
Isolation:     TRANSACTION_READ_COMMITTED
Scale Factor:  70.0

07:53:42,869 (DBWorkload.java:263) INFO  - ======================================================================
07:53:42,883 (DBWorkload.java:522) INFO  - Creating new TPCC database...
07:53:43,054 (DBWorkload.java:524) INFO  - Finished!
07:53:43,055 (DBWorkload.java:525) INFO  - ======================================================================
07:53:43,055 (DBWorkload.java:548) INFO  - Loading data into TPCC database...
08:24:26,300 (DBWorkload.java:550) INFO  - Finished!
08:24:26,300 (DBWorkload.java:551) INFO  - ======================================================================
08:24:26,300 (DBWorkload.java:591) INFO  - Skipping benchmark workload execution

Done.

At this point you need to have your server machine setup, make sure everything works with :

fab loop

7) Once you managed to make it this far without errors, you have to change the oltpbench configuration : Andy and Dana advise to use this

<scalefactor>200</scalefactor>

    <!-- The workload -->
    <terminals>50</terminals>
    <works>
        <work>
          <time>200</time>
          <rate>unlimited</rate>
          <weights>45,43,4,4,4</weights>
        </work>
     </works>

So put these values in your oltpbench/config/{yourconfiguration} . Then in the driver directory :

fab drop_database
fab create_database
fab load_oltpbench

If it doesn't give errors straight up, then it means it worked and you're ok to go. this might take a while (27min for a less beefy configuration in my case). Note that the beefier the config, the longer it takes. If it works for you, great. It didn't for me, so I had to modify the configuration : A too high Scalefactor prevented oltpbench from working for some reason, (limiting value was 95 in my case). drop it down and repeat until fab load_oltpbench doesn't give you errors.

then

fab loop

this is to make sure that you have enough time in your workload. If you don't, it will give you something like :

ottertune/client/driver$ fab loop
[localhost] Executing task 'loop'
[localhost] local: sync; sudo bash -c "echo 1 > /proc/sys/vm/drop_caches"
[sudo] password for thomas: 
[localhost] local: sudo service postgresql restart
[localhost] local: df -h /dev/sda1
2019-01-15 08:57:20,582 [INFO]  Current Disk Usage: 22%
[localhost] local: ./oltpbenchmark -b tpcc -c /ottertune_dev/oltpbench/config/tpcc_config_postgres.xml --execute=true -s 5 -o outputfile > /ottertune_dev/ottertune/client/driver/oltp.log 2>&1 &
[localhost] local: sudo gradle run -PappArgs="-c /ottertune_dev/ottertune/client/controller/config/sample_postgres_config.json -d output/" --no-daemon
:compileJava UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:run
09:02:56,188 (Main.java:99) INFO  - Experiment time is set to: -1
09:02:56,210 (Main.java:105) INFO  - Experiment output directory is set to: output/
09:02:57,194 (Main.java:134) INFO  - First collection of metrics before experiment
> Building 75% > :run[localhost] local: sudo kill -2 29606
09:02:57,673 (Main.java:177) INFO  - Starting the experiment ...
09:02:57,692 (Main.java:191) INFO  - Done running the experiment
09:02:57,696 (Main.java:218) INFO  - Second collection of metrics after experiment
09:02:58,271 (Main.java:243) WARN  - Failed to upload results from the controller

BUILD SUCCESSFUL

Total time: 5 mins 37.18 secs
[localhost] local: python3 ../../server/website/script/upload/upload.py            ../controller/output/ 23Y4E4QZ610CFBS72M6F http://vslike:8000/new_result/
b'\n<!DOCTYPE html>\n<html lang="en">\n<head>\n  <meta http-equiv="content-type" content="text/html; charset=utf-8">\n  <meta name="robots" content="NONE,NOARCHIVE">\n  <title>ZeroDivisionError at /new_result/</title>\n
[...]

and a really long html file. Note that ZeroDivisionError at /new_result/ means that the bench didn't even have time to run. I suspect there is a problem in the timing of the actions in this function. Note that a task is killed before the experiment even started. if it worked, check ottertune/client/controller/output/summary.json, see observation_time. This is the effective duration of the Benchmark.

ottertune/client/driver$ cat ../controller/output/summary.json | jq
{
  "start_time": 1547542977673,
  "end_time": 1547542977692,
  "observation_time": 0,
  "database_type": "postgres",
  "database_version": "9.6",
  "workload_name": "workload_name"
}

repeat the steps until you find the maximum values that your configuration will support. Then you can choose the values you want. I cannot advise you what values to give it, I took it down to 70 because elseway my server won't run this fast enough. number of terminals doesn't influence if oltpbench works or not in my experience. Here is the configuration I use for instance :

<scalefactor>70</scalefactor>

    <!-- The workload -->
    <terminals>50</terminals>
    <works>
        <work>
          <time>320</time>
          <rate>unlimited</rate>
          <weights>45,43,4,4,4</weights>
        </work>
     </works>

On your server machine

1) edit ottertune/server/website/script/installation/bootstrap.sh there are paths that are not correct. either rectify the file or run the instructions yourself. Note that this is the file where you'll find the credentials to log in the website's database. this is useful if you have to make modifications and don't want to reset the website everytime you do.

2) edit your knobs in ottertune/server/website/website/fixtures/{knobs for your dbms} I found the most important ones were : global.bgwriter_delay global.bgwriter_lru_maxpages global.checkpoint_completion_target global.checkpoint_timeout global.deadlock_timeout global.default_statistics_target global.effective_cache_size global.effective_io_concurrency global.random_page_cost global.shared_buffers global.work_mem global.maintenance_work_mem

I removed every other knob because it either makes the process slower or you shouldn't touch them (according to my DBA). reduce the range research by changing default and maxval. I can not help you with that either, but see my fixture file in a future post below. note that you can find the units and vartypes in ottertune/server/website/website/types.py, this will help you know what the values are or should be.

Note that the tuning session will generate configurations within the range [default, maxval].

3) you're good to go : run the website and celery as explained (here)[https://github.com/cmu-db/ottertune/wiki/Getting-Started-ottertune] if you have to make modifications to knobs on the go, you can do it in the ottertune database. If you have to make modifications to knobs that reduce the range, you might be given errors saying that values are out of range. I didn't try it but you can try to make a new session and use the upload_batch feature on your client side to upload all the results you've had until here. don't forget to not keep the ones with out of range values.

Last

client side : fab run_loops:1000 (during a randomly generate sessions, advice from the team) server side : when you decide to change to tuning_session you have to run celerybeat. if you don't it won't work. there's a minimum amount of data to have before you can run a tuning session. side note : it is useless to run celerybeat during a randomly generate session.

It happened that I had to restart the website and celery worker before it accepted running the tuning session.

PS : I didn't even succeed in getting results yet, I'm still trying to figure things out.

nabti commented 5 years ago

Thank you very much @thomasloubejac thank you for your time explaining this step by step. I'll try to test this for my case it is for mysql on percona server.

Thank you again

thomasloubejac commented 5 years ago

ottertune/client/driver$ cat driver_config.json | jq { "database_type": "postgres", "database_name": "dbtest", "database_disk": "/dev/sda1", "database_conf": "/etc/postgresql/9.6/main/postgresql.conf", "username": "thomas", "password": "thomas", "oltpbench_home": "/ottertune_dev/oltpbench", "oltpbench_config": "/ottertune_dev/oltpbench/config/tpcc_config_postgres.xml", "oltpbench_workload": "tpcc", "oltpbench_log": "/ottertune_dev/ottertune/client/driver/oltp.log", "controller_config": "/ottertune_dev/ottertune/client/controller/config/sample_postgres_config.json", "save_path": "/ottertune_dev/results", //make sure to mkdir the directory "upload_url": "http://your_server_machine_IP_Address:8000", "upload_code": "23Y4E4QZ610CFBS72M6F" } ottertune/client/controller$ cat config/sample_postgres_config.json | jq { "database_type": "postgres", "database_url": "jdbc:postgresql://localhost:5432/dbtest", "username": "thomas", "password": "thomas", "upload_code": "DEPRECATED", "upload_url": "DEPRECATED", "workload_name": "workload_name" } oltpbench$ cat config/tpcc_config_postgres.xml <?xml version="1.0"?>

postgres org.postgresql.Driver jdbc:postgresql://localhost:5432/dbtest thomas thomas TRANSACTION_READ_COMMITTED 70 50 320 unlimited 45,43,4,4,4 NewOrder Payment OrderStatus Delivery StockLevel

thomasloubejac commented 5 years ago

ottertune/server/website/website/fixtures/$ cat postgres9-6_knobs.json

[
    {
        "fields": {
            "resource": "4",
            "vartype": "2",
            "category": "Resource Usage / Background Writer",
            "tunable": true,
            "description": "",
            "dbms": 1,
            "default": "200",
            "context": "sighup",
            "unit": "2",
            "summary": "Background writer sleep time between rounds.",
            "name": "global.bgwriter_delay",
            "scope": "global",
            "minval": "10",
            "maxval": "10000",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "resource": "4",
            "vartype": "2",
            "category": "Resource Usage / Background Writer",
            "tunable": true,
            "description": "",
            "dbms": 1,
            "default": "100",
            "context": "sighup",
            "unit": "3",
            "summary": "Background writer maximum number of LRU pages to flush per round.",
            "name": "global.bgwriter_lru_maxpages",
            "scope": "global",
            "minval": "0",
            "maxval": "1000",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "resource": "4",
            "vartype": "3",
            "category": "Write-Ahead Log / Checkpoints",
            "tunable": true,
            "description": "",
            "dbms": 1,
            "default": "0.5",
            "context": "sighup",
            "unit": "3",
            "summary": "Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.",
            "name": "global.checkpoint_completion_target",
            "scope": "global",
            "minval": "0.3",
            "maxval": "1.0",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "resource": "4",
            "vartype": "2",
            "category": "Write-Ahead Log / Checkpoints",
            "tunable": true,
            "description": "",
            "dbms": 1,
            "default": "300000",
            "context": "sighup",
            "unit": "2",
            "summary": "Sets the maximum time between automatic WAL checkpoints.",
            "name": "global.checkpoint_timeout",
            "scope": "global",
            "minval": "30000",
            "maxval": "86400000",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "resource": "4",
            "vartype": "2",
            "category": "Lock Management",
            "tunable": true,
            "description": "",
            "dbms": 1,
            "default": "1000",
            "context": "superuser",
            "unit": "2",
            "summary": "Sets the time to wait on a lock before checking for deadlock.",
            "name": "global.deadlock_timeout",
            "scope": "global",
            "minval": "1",
            "maxval": "2147483647",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "resource": "4",
            "vartype": "2",
            "category": "Query Tuning / Other Planner Options",
            "tunable": true,
            "description": "This applies to table columns that have not had a column-specific target set via ALTER TABLE SET STATISTICS.",
            "dbms": 1,
            "default": "100",
            "context": "user",
            "unit": "3",
            "summary": "Sets the default statistics target.",
            "name": "global.default_statistics_target",
            "scope": "global",
            "minval": "100",
            "maxval": "400",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "resource": "1",
            "vartype": "2",
            "category": "Query Tuning / Planner Cost Constants",
            "tunable": true,
            "description": "That is, the portion of the kernel's disk cache that will be used for PostgreSQL data files. This is measured in disk pages, which are normally 8 kB each.",
            "dbms": 1,
            "default": "4294967296",
            "context": "user",
            "unit": "1",
            "summary": "Sets the planner's assumption about the size of the disk cache.",
            "name": "global.effective_cache_size",
            "scope": "global",
            "minval": "8192",
            "maxval": "12884901888",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "resource": "4",
            "vartype": "2",
            "category": "Resource Usage / Asynchronous Behavior",
            "tunable": true,
            "description": "For RAID arrays, this should be approximately the number of drive spindles in the array.",
            "dbms": 1,
            "default": "1",
            "context": "user",
            "unit": "3",
            "summary": "Number of simultaneous requests that can be handled efficiently by the disk subsystem.",
            "name": "global.effective_io_concurrency",
            "scope": "global",
            "minval": "1",
            "maxval": "50",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "resource": "4",
            "vartype": "3",
            "category": "Query Tuning / Planner Cost Constants",
            "tunable": true,
            "description": "",
            "dbms": 1,
            "default": "4.0",
            "context": "user",
            "unit": "3",
            "summary": "Sets the planner's estimate of the cost of a nonsequentially fetched disk page.",
            "name": "global.random_page_cost",
            "scope": "global",
            "minval": "1.0",
            "maxval": "20",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "resource": "1",
            "vartype": "2",
            "category": "Resource Usage / Memory",
            "tunable": true,
            "description": "",
            "dbms": 1,
            "default": "8388608",
            "context": "postmaster",
            "unit": "1",
            "summary": "Sets the number of shared memory buffers used by the server.",
            "name": "global.shared_buffers",
            "scope": "global",
            "minval": "131072",
            "maxval": "10737418240",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "resource": "1",
            "vartype": "2",
            "category": "Write-Ahead Log / Settings",
            "tunable": true,
            "description": "",
            "dbms": 1,
            "default": "32768",
            "context": "postmaster",
            "unit": "1",
            "summary": "Sets the number of disk-page buffers in shared memory for WAL.",
            "name": "global.wal_buffers",
            "scope": "global",
            "minval": "32768",
            "maxval": "16777216",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "resource": "1",
            "vartype": "2",
            "category": "Resource Usage / Memory",
            "tunable": true,
            "description": "This much memory can be used by each internal sort operation and hash table before switching to temporary disk files.",
            "dbms": 1,
            "default": "4194304",
            "context": "user",
            "unit": "1",
            "summary": "Sets the maximum memory to be used for query workspaces.",
            "name": "global.work_mem",
            "scope": "global",
            "minval": "65536",
            "maxval": "17179869184",
            "enumvals": null
        },
        "model": "website.KnobCatalog"
    },
    {
        "fields": {
            "category": "Resource Usage / Memory",
            "maxval": "17179869184",
            "dbms": 1,
            "name": "global.maintenance_work_mem",
            "minval": "1048576",
            "default": "67108864",
            "tunable": true,
            "enumvals": null,
            "vartype": "2",
            "context": "user",
            "scope": "global",
            "summary": "Sets the maximum memory to be used for maintenance operations.",
            "unit": "1",
            "description": "This includes operations such as VACUUM and CREATE INDEX.",
            "resource": "1"
        },
        "model": "website.KnobCatalog"
    }
]

thomasloubejac commented 5 years ago

Ok then I hope this will help you!

bohanjason commented 5 years ago

Now we have an end-to-end example using OtterTune to tune Postgres 9.6 with a AWS instance. It may be a good example to get started.

nabti commented 5 years ago

@bohanjason Thank you very much

bohanjason commented 5 years ago

@nabti No problem , sorry for the late reply.

molujii commented 4 years ago

@thomasloubejac can you make the video for this it will be a great help because from last 1.5 months i am trying to run it but still not able to run it One more question can you run this on vmware linux?

molujii commented 4 years ago

@thomasloubejac @nabti were you able to run whole project end-to-end.

cmu-db / ottertune

Guide on how to use TPCC benchmark #190

On your client Machine

On your server machine

Last