Starting database is slow

@per-samuelsson commented on Fri Apr 08 2016

We are spending some time trying to improve on the cycle, and I'm finding that starting a database even in the simplest scenario is really slow.

staradmin -d=foo123 new db
staradmin -d=foo123 start db

That last step takes approx. 15 seconds (!!) on my machine.

And it does not improve if I do it a second time:

staradmin -d=foo123 stop db
staradmin -d=foo123 start db

I'll try polishing of old "log steps" to see if I can find where all that time is spent.

If anyone can post their experience doing the same thing (I basically just counted seconds after hitting staradmin ....), that's appreciated.

@alemoi commented on Fri Apr 08 2016

Great! I created an issue some time ago about super slowness on sctestsrv03 machine. Around 30 seconds there.

@per-samuelsson commented on Fri Apr 08 2016

Thanks, @alemoi. That's #3442, and let's try to address these two as one. My initial tracing seem to align quite well with those of that issue.

@ivarbergman commented on Fri Apr 08 2016

Around 20 sec on our production server (Azure A2) . approx 10 sec on my local machine.

@per-samuelsson commented on Fri Apr 08 2016

Around 20 sec on our production server (Azure A2) . approx 10 sec on my local machine.

Thanks @ivarbergman. We should really try to improve those numbers.

Here's a script that could be used:

timeboot.bat

@echo off

set db=%1

IF NOT EXIST app.cs (
  staradmin +
)

staradmin kill all
staradmin start server

set start=%TIME%
star -d=%db% --logsteps app.cs
set stop=%TIME%

ECHO Frame: %start% - %stop%

Here's a sample run from me, on a locally built version:

Logs (since 12:33:46.0400767)
-----------------------------
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
143.9936:Package processed.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
143.9847:Processing completed.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
143.9597:Entry point executed.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
136.2446:CLR view meta-data were populated for the given types.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
86.4498:Query module schema information updated.

[12:34:08, Debug, Sql (foo123)]
Exported schema info: C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234085032114738359.pl

[12:34:08, Debug, Sql (foo123)]
Exported SQL schema info for foo123 to C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234085032114738359.pl in 0 ms.

[12:34:08, Debug, Sql (foo123)]
Exported schema info: C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234083831463698039.pl

[12:34:08, Debug, Sql (foo123)]
Exported SQL schema info for appfoo123 to C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234083831463698039.pl in 16 ms.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
25.2593:Columns of database schema checked and updated.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
25.2498:Type definitions registered.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
25.1412:Types of database schema checked and updated.

[12:34:08, Debug, Sql (foo123)]
Exported schema info: C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema160408123408362693170111.pl

[12:34:08, Debug, Sql (foo123)]
Exported SQL schema info for appfoo123 to C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema160408123408362693170111.pl in 0 ms.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
19.6484:Package started.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
18.9861:Package created.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
18.9331:Target assembly loaded.

[12:34:08, Debug, Starcounter.Host.AssemblyResolver (foo123)]
Loading assembly app, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null from C:\Users\Per\Documents\Starcounter\Personal\Temp\foo123\foo123-20160408T110756165\weaver\409047624C1F61434671885EEC91BC148836103B\app.exe

[12:34:08, Debug, Starcounter.Host.AssemblyResolver (foo123)]
0.0:Loading assembly app, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null from C:\Users\Per\Documents\Starcounter\Personal\Temp\foo123\foo123-20160408T110756165\weaver\409047624C1F61434671885EEC91BC148836103B\app.exe

[12:34:08, Debug, Starcounter.Host.AssemblyResolver (foo123)]
0.0:Resolving application: C:\Users\Per\Documents\Starcounter\Personal\Temp\foo123\foo123-20160408T110756165\weaver\409047624C1F61434671885EEC91BC148836103B\app.exe

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
13.3552:Schema verified and loaded.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
2.5785:Input verified and assembly resolver updated.

[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
0.296:Loader started.

[12:34:07, Debug, Starcounter.Server.Processes (scadminserver)]
Starting ""C:\Users\Per\Git\Starcounter\Develop\Level1\bin\Debug\scweaver.exe" --maxerrors=1 --ErrorParcelId=A4A7D6FA-EB34-442A-B579-DBB1DBB859E3 Weave "C:\Users\Per\AppData\Local\Temp\5eb1d56c-b830-446d-96d0-b31d3918216e\app.exe" --outdir="C:\Users\Per\Documents\Starcounter\Personal\Temp\foo123\foo123-20160408T110756165\weaver\409047624C1F61434671885EEC91BC148836103B"" at 12.34.07.265

[12:34:05, Debug, Starcounter.Host (foo123)]
4611.8811:Booting completed.

[12:34:05, Debug, Starcounter.Host (foo123)]
4611.6293:Start completed.

[12:34:05, Debug, Starcounter.Host (foo123)]
4611.6010:Base package loaded.

[12:34:05, Debug, Starcounter.Host.Loader (foo123)]
4564.6801:Package processed.

[12:34:05, Debug, Starcounter.Host.Loader (foo123)]
4564.5596:Processing completed.

[12:34:05, Debug, Starcounter.Host.Loader (foo123)]
4564.4908:Internal handlers were registered.

[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4115.1704:CLR view meta-data were populated for the given types.

[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4066.379:System type specifications initialized.

[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4053.2076:Query module schema information updated.

[12:34:03, Debug, Sql (foo123)]
Exported schema info: C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234031061266764082.pl

[12:34:03, Debug, Sql (foo123)]
Exported SQL schema info for foo123 to C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234031061266764082.pl in 0 ms.

[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4047.9946:Columns of database schema checked and updated.

[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4047.6953:Type definitions registered.

[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4047.2666:Types of database schema checked and updated.

[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4045.7563:Properties and columns were populated for the given meta-types.

[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4044.624:CLR view meta-data were deleted on host start.

[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4033.2472:Runtime meta-data tables were created and populated with initial data.

[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4031.5012:Package started.

[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4030.4530:Package created.

[12:34:03, Debug, Starcounter.Host.AssemblyResolver (foo123)]
0.0:Assembly resolver created in process 14272

[12:34:03, Debug, Starcounter.Host (foo123)]
4074.5988:Server command handlers registered.

[12:34:02, Debug, Starcounter.Host (foo123)]
3817.2832:App domain configured.

[12:34:02, Debug, Starcounter.Host (foo123)]
3817.2457:Network gateway connected.

[12:34:02, Debug, Starcounter.Host (foo123)]
3817.1450:Scheduler started.

[12:34:02, Debug, Starcounter.Host (foo123)]
3816.9241:Setup completed.

[12:34:02, Debug, Starcounter.Host (foo123)]
3816.8600:Query module initiated.

[12:34:01, Debug, Starcounter.Host (foo123)]
3811.1833:Database connected.

[12:34:01, Debug, Starcounter (foo123)]
Starting log writer at address 995. Next commit id is 496.

[12:34:01, Debug, Starcounter (foo123)]
Recovery completed. 12 transactions recovered.

[12:34:01, Debug, Starcounter (foo123)]
Starting recovery at address 719. Next commit id is 484.

[12:33:48, Debug, Starcounter.Host (foo123)]
374.5935:Database configured.

[12:33:48, Debug, Starcounter.Host (foo123)]
373.8425:Apps bootstraper initialized.

[12:33:47, Debug, Starcounter.Host (foo123)]
86.429:Scheduler configured.

[12:33:47, Debug, Starcounter.Host (foo123)]
65.5774:Host configured.

[12:33:47, Debug, Starcounter.Host (foo123)]
65.2494:BMX manager initialized.

[12:33:47, Debug, Starcounter.Host (foo123)]
63.6016:Logging configured.

[12:33:47, Debug, Starcounter.Server.Processes (scadminserver)]
Begin monitoring code host process sccode, PID 14272, running database foo123

[12:33:47, Warning, Starcounter (foo123)]
Image file writer unable to acquire the SE_MANAGE_VOLUME_NAME privilege.

Frame 12:33:45.84 - 12:34:10.00

This extract seem to be the most interesting:

[12:34:01, Debug, Starcounter.Host (foo123)]
3811.1833:Database connected.

[12:34:01, Debug, Starcounter (foo123)]
Starting log writer at address 995. Next commit id is 496.

[12:34:01, Debug, Starcounter (foo123)]
Recovery completed. 12 transactions recovered.

[12:34:01, Debug, Starcounter (foo123)]
Starting recovery at address 719. Next commit id is 484.

[12:33:48, Debug, Starcounter.Host (foo123)]
374.5935:Database configured.

@per-samuelsson commented on Fri Apr 08 2016

I refined the script from above further, so we also produce a diagnostic time log, since traces routed to the log seem not always to turn up.

timeboot.bat

@echo off

set db=%1

IF NOT EXIST app.cs (
  staradmin +
)

set SC_TIMESTAMPS_FILE_PATH=
IF EXIST timeboot.txt (
  del timeboot.txt
)

staradmin kill all

set SC_TIMESTAMPS_FILE_PATH=%cd%\timeboot.txt
staradmin start server

:: We are not interested in results from booting up
:: admin server. Delete what was emitted up until now.
del timeboot.txt

set start=%TIME%
star -d=%db% --logsteps app.cs
set stop=%TIME%
set SC_TIMESTAMPS_FILE_PATH=

ECHO Frame: %start% - %stop%
ECHO Diagnostics also in timeboot.txt

And an extract from a timeboot.txt, somewhat prettyfied and with a few comments:


// 01.36.30.709 Admin server recieve request to boot up the database
Starcounter.Server.Commands: Executing 'Starting database foo123.' (0e99e927-9e8b-4b9e-ad36-1a084e454942): 01.36.30.709

Starcounter.Server.Commands: Begin task 'Starting database process': 01.36.30.714
Starcounter.Server.Commands: End task (cancelled=False): 01.36.30.745

Starcounter.Server.Commands: Begin task 'Starting log writer process': 01.36.30.748
Starcounter.Server.Commands: End task (cancelled=False): 01.36.30.760

Starcounter.Server.Commands: Begin task 'Starting code host process': 01.36.30.763
Starcounter.Server.Commands: End task (cancelled=False): 01.36.30.778

Starcounter.Server.Commands: Begin task 'Assuring code host is ready for service': 01.36.30.782
Starcounter.Host: Started sccode Main(): 01.36.30.807
Starcounter.Host: Bootstrap Main() started.: 01.36.30.825
Starcounter.Host: Exception factory installed.: 01.36.30.840
Starcounter.Host: Command line parsed.: 01.36.30.865
Starcounter.Host: Configuration loaded.: 01.36.30.870
Starcounter.Host: Assured no other process with the same name.: 01.36.30.874
Starcounter.Host: Global memory allocated.: 01.36.30.878
Starcounter.Host: Kernel memory configured.: 01.36.30.883
Starcounter.Host: Logging configured.: 01.36.30.902
Starcounter.Host: BMX manager initialized.: 01.36.30.908
Starcounter.Host: Host configured.: 01.36.30.912
Starcounter.Host: Scheduler configured.: 01.36.30.995
Starcounter.Host: Apps bootstraper initialized.: 01.36.32.131
Starcounter.Host: Database configured.: 01.36.32.137
Starcounter.Host: Database connected.: 01.36.46.230
Starcounter.Host: Query module initiated.: 01.36.46.253
Starcounter.Host: Setup completed.: 01.36.46.257
Starcounter.Host: Scheduler started.: 01.36.46.261
Starcounter.Host: Network gateway connected.: 01.36.46.265
Starcounter.Host: App domain configured.: 01.36.46.268
Starcounter.Host: Server command handlers registered.: 01.36.47.291
Starcounter.Host.Loader: Package created.: 01.36.47.310
Starcounter.Host.Loader: Package started.: 01.36.47.317
Starcounter.Host.Loader: Runtime meta-data tables were created and populated with initial data.: 01.36.47.327
Starcounter.Host.Loader: CLR view meta-data were deleted on host start.: 01.36.47.380
Starcounter.Host.Loader: Properties and columns were populated for the given meta-types.: 01.36.47.390
Starcounter.Host.Loader: Types of database schema checked and updated.: 01.36.47.399
Starcounter.Host.Loader: Type definitions registered.: 01.36.47.404
Starcounter.Host.Loader: Columns of database schema checked and updated.: 01.36.47.409
Starcounter.Host.Loader: Query module schema information updated.: 01.36.47.431
Starcounter.Host.Loader: System type specifications initialized.: 01.36.47.484
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 01.36.47.660
Starcounter.Host.Loader: Internal handlers were registered.: 01.36.48.700
Starcounter.Host.Loader: Processing completed.: 01.36.48.703
Starcounter.Host.Loader: Package processed.: 01.36.48.706
Starcounter.Host: Base package loaded.: 01.36.48.709
Starcounter.Host: Start completed.: 01.36.48.712
Starcounter.Host: Booting completed.: 01.36.48.716
Starcounter.Server.Commands: End task (cancelled=False): 01.36.50.657

// 01.36.50.660 Admin server is done booting database processes (~20 sec)
Starcounter.Server.Commands: Executing completed (0e99e927-9e8b-4b9e-ad36-1a084e454942): 01.36.50.660

// 01.36.50.815 Admin server is asked to start application
Starcounter.Server.Commands: Executing 'Starting app in foo123' (9ab8c1f9-8a0a-4a6d-b7f6-6da0116a37d0): 01.36.50.815

Starcounter.Server.Commands: Begin task 'Preparing user executables and files': 01.36.50.822
Starcounter.Server.Commands: Weaving completed.: 01.36.51.736
Starcounter.Server.Commands: End task (cancelled=False): 01.36.51.739

Starcounter.Server.Commands: Begin task 'Loading executable in code host': 01.36.51.742
Starcounter.Host.Loader: Loader started.: 01.36.51.834
Starcounter.Host.Loader: Input verified and assembly resolver updated.: 01.36.51.848
Starcounter.Host.Loader: Schema verified and loaded.: 01.36.51.915
Starcounter.Host.Loader: Target assembly loaded.: 01.36.51.941
Starcounter.Host.Loader: Package created.: 01.36.51.944
Starcounter.Host.Loader: Package started.: 01.36.51.950
Starcounter.Host.Loader: Types of database schema checked and updated.: 01.36.51.975
Starcounter.Host.Loader: Type definitions registered.: 01.36.51.978
Starcounter.Host.Loader: Columns of database schema checked and updated.: 01.36.51.981
Starcounter.Host.Loader: Query module schema information updated.: 01.36.52.252
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 01.36.52.453
Starcounter.Host.Loader: Entry point executed.: 01.36.52.487
Starcounter.Host.Loader: Processing completed.: 01.36.52.491
Starcounter.Host.Loader: Package processed.: 01.36.52.495
Starcounter.Server.Commands: Code host Exec request processed.: 01.36.52.502
Starcounter.Server.Commands: Database app registered.: 01.36.52.506
Starcounter.Server.Commands: Database status updated.: 01.36.53.526
Starcounter.Server.Commands: End task (cancelled=False): 01.36.53.533

// 01.36.53.538 Admin server has weaved and loaded application (~3 seconds)
Starcounter.Server.Commands: Executing completed (9ab8c1f9-8a0a-4a6d-b7f6-6da0116a37d0): 01.36.53.538

@per-samuelsson commented on Fri Apr 08 2016

Since the majority of time spent is consumed when booting up an empty host, I'll focus on that initially, trying to find answers. Here's spots I've filtered out from that sample, all consuming more than 1 second, and sorted by their relative impact:

Booting code host (> 1 second gaps)
-----------------------------------

14 sec
Starcounter.Host: Database configured.: 01.36.32.137
Starcounter.Host: Database connected.: 01.36.46.230

2 sec
Starcounter.Host: Booting completed.: 01.36.48.716
Starcounter.Server.Commands: End task (cancelled=False): 01.36.50.657

1,5 sec
Starcounter.Host: Scheduler configured.: 01.36.30.995
Starcounter.Host: Apps bootstraper initialized.: 01.36.32.131

1 sec
Starcounter.Host: App domain configured.: 01.36.46.268
Starcounter.Host: Server command handlers registered.: 01.36.47.291

1 sec
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 01.36.47.660
Starcounter.Host.Loader: Internal handlers were registered.: 01.36.48.700

Note that absolute numbers shouldn't be trusted here, since I'm running a personal build that might contain some debug code. The focus is on relative numbers.

@per-samuelsson commented on Fri Apr 15 2016

Doing some closer measurements didn't reveal anything new in the first iteration. Just as everything suggest above, the host "connecting" to the database is what steal almost all of the cycles.

Function Name   Total CPU (%)   Self CPU (%)    Total CPU (ms)  Self CPU (ms)   Module
Bootstrap.Control::ConnectDatabase  93.12 %    0.00 % 16565   0   Starcounter.Bootstrap.dll

And that method is nothing more than this:

private unsafe void ConnectDatabase(uint schedulerCount, ulong hmenv, ulong hlogs) {
            uint e;

            uint flags = 0;
            flags |= sccoredb.SCCOREDB_LOAD_DATABASE;
            flags |= sccoredb.SCCOREDB_USE_BUFFERED_IO;
            flags |= sccoredb.SCCOREDB_ENABLE_CHECK_FILE_ON_LOAD;
            //flags |= sccoredb.SCCOREDB_ENABLE_CHECK_FILE_ON_CHECKP;
            flags |= sccoredb.SCCOREDB_ENABLE_CHECK_FILE_ON_BACKUP;
            flags |= sccoredb.SCCOREDB_ENABLE_CHECK_MEMORY_ON_CHECKP;

            e = sccoredb.sccoredb_connect(flags, schedulerCount, hmenv, hlogs, sccorelib.fix_get_performance_counter_file_map());
            if (e != 0) throw ErrorCode.ToException(e);

            e = filter.init_filter_lib(hmenv, hlogs);
            if (e != 0) throw ErrorCode.ToException(e);
        }

I've recompiled experimenting with the flags too, but the impact is close to zero. So I guess the next step for me must be to start profiling Bluestar.

@dan31 commented on Fri Apr 15 2016

I CC @eriohl and @alemoi there to ensure they receive updates on that one (they've recently looked into starting a db standalone and no slowdown was noticeable, hence I believe they might add valuable insights for the current story).

@per-samuelsson commented on Fri Apr 15 2016

[...] they've recently looked into starting a db standalone and no slowdown was noticeable

Thats kind of weird. I'm doing this on a local build, and one that bring down the Bluestar binaries from our build server, but maybe it's not a sharp version or something like that? I do remember that in the past, there was a huge difference between the Debug and Release builds in booting up (the former doing a bunch of checks on the page level or something like that).

On the other side, we've now seen reports about really slow boots from customers too, and they'll for sure have release builds. But then again, it might be they don't suffer from the exact same pain I'm suffering from.

I think I'll install a sharp version here and run it on that to make sure. Stay tuned. :smile:

@alemoi commented on Fri Apr 15 2016

I think that might be code generation for indexes that is very slow. It happened when I moved codegen to be in memory using Clang. Might that be true @eriohl?

@per-samuelsson commented on Fri Apr 15 2016

I think I'll install a sharp version here and run it on that to make sure. Stay tuned.

When downloading and installing, numbers improved significantly on my environment, so that indicate that doing a local build of Level1 will bring down some not-optimized Bluestar binaries, which probably is the right thing to do.

WTBS, I still time around 10-15 seconds for booting up just an empty application. I'll go through my diagnostic logs and try to make a selection like the one above, but with this new data.

@eriohl commented on Fri Apr 15 2016

I think that might be code generation for indexes that is very slow. It happened when I moved codegen to be in memory using Clang. Might that be true @eriohl?

I think it is a lot of stuff that is very slow but this certainly hasn't helped any. But it seems not to be all that relevant since there is no real alternative here. And I note that we starting the host without the managed host it goes a lot faster and it also does the code generation (for indexes at least). So perhaps not the biggest culprit.

@per-samuelsson commented on Fri Apr 15 2016

So here's a new extract, where I've cut off some edges that seem not very interesting. In this sample, the bootstrapping takes ~11 seconds, and include:

Compiling source on the fly (done by star.exe) (~1 second)
Starting scdata (-)
Starting scdblog (-)
Starting sccode (~8 seconds)
Weaving (~1 second)
Bootstrapping application in host (~1 second)

In the past, we've said that we can't be satisfied with the bootstrapping sequence until the weaver consumes up the majority of time of it, and I'm not sure that apply 100% in this sample, since the application is super-limited. But even if it's just like truth-ish, the ratios are pretty off as can be seen.

Here's the extract of the code host bootstrapper:

Starcounter.Server.Commands: Begin task 'Assuring code host is ready for service': 11.31.01.419
Starcounter.Host: Started sccode Main(): 11.31.01.444
Starcounter.Host: Bootstrap Main() started.: 11.31.01.461
Starcounter.Host: Exception factory installed.: 11.31.01.478
Starcounter.Host: Command line parsed.: 11.31.01.508
Starcounter.Host: Configuration loaded.: 11.31.01.513
Starcounter.Host: Assured no other process with the same name.: 11.31.01.518
Starcounter.Host: Global memory allocated.: 11.31.01.521
Starcounter.Host: Kernel memory configured.: 11.31.01.526
Starcounter.Host: Logging configured.: 11.31.01.547
Starcounter.Host: BMX manager initialized.: 11.31.01.553
Starcounter.Host: Host configured.: 11.31.01.557
Starcounter.Host: Scheduler configured.: 11.31.01.570
Starcounter.Host: Apps bootstraper initialized.: 11.31.02.824
Starcounter.Host: Database configured.: 11.31.02.837
Starcounter.Host: Database connected.: 11.31.05.809
Starcounter.Host: Query module initiated.: 11.31.05.831
Starcounter.Host: Setup completed.: 11.31.05.834
Starcounter.Host: Scheduler started.: 11.31.05.838
Starcounter.Host: Network gateway connected.: 11.31.05.842
Starcounter.Host: App domain configured.: 11.31.05.844
Starcounter.Host: Server command handlers registered.: 11.31.06.890
Starcounter.Host.Loader: Package created.: 11.31.06.911
Starcounter.Host.Loader: Package started.: 11.31.06.918
Starcounter.Host.Loader: Runtime meta-data tables were created and populated with initial data.: 11.31.06.928
Starcounter.Host.Loader: CLR view meta-data were deleted on host start.: 11.31.06.936
Starcounter.Host.Loader: Properties and columns were populated for the given meta-types.: 11.31.06.945
Starcounter.Host.Loader: Types of database schema checked and updated.: 11.31.06.954
Starcounter.Host.Loader: Type definitions registered.: 11.31.06.960
Starcounter.Host.Loader: Columns of database schema checked and updated.: 11.31.06.963
Starcounter.Host.Loader: Query module schema information updated.: 11.31.06.988
Starcounter.Host.Loader: System type specifications initialized.: 11.31.07.041
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 11.31.07.226
Starcounter.Host.Loader: Internal handlers were registered.: 11.31.08.319
Starcounter.Host.Loader: Processing completed.: 11.31.08.322
Starcounter.Host.Loader: Package processed.: 11.31.08.326
Starcounter.Host: Base package loaded.: 11.31.08.330
Starcounter.Host: Start completed.: 11.31.08.334
Starcounter.Host: Booting completed.: 11.31.08.339
Starcounter.Server.Commands: End task (cancelled=False): 11.31.09.771

@alemoi commented on Fri Apr 15 2016

When downloading and installing, numbers improved significantly on my environment, so that indicate that doing a local build of Level1 will bring down some not-optimized Bluestar binaries, which probably is the right thing to do.

That might help: https://github.com/Starcounter/RebelsLounge/blob/master/buildsystem.md#building-level0-as-part-of-level1

@per-samuelsson commented on Fri Apr 15 2016

That might help: https://github.com/Starcounter/RebelsLounge/blob/master/buildsystem.md#building-level0-as-part-of-level1

Ah, I knew about that of course, but I had missed this:

LEVEL0_CONFIGURATION variable also applies to those who have no Level0 sources, so corresponding configuration binaries are fetched.

Thanks for pointing that out! And like I said, I think the current feature is correctly designed, so no need to change anything.

@dan31 commented on Fri Apr 15 2016

So the biggest breaches appear to be: Starcounter.Host: Database configured.: 11.31.02.837 Starcounter.Host: Database connected.: 11.31.05.809 (3 sec) and Starcounter.Host: App domain configured.: 11.31.05.844 Starcounter.Host: Server command handlers registered.: 11.31.06.890 (1 sec), the rest is quite equally distributed into small portions of time (4-14 msecs mostly).

@alemoi commented on Fri Apr 15 2016

What about this one?

Starcounter.Host: Scheduler configured.: 11.31.01.570
Starcounter.Host: Apps bootstraper initialized.: 11.31.02.824

@dan31 commented on Fri Apr 15 2016

@alemoi yes also quite a big one, totalling in 5 secs in 3 breaches.

@alemoi commented on Fri Apr 15 2016

And this one which is on me i guess:

Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 11.31.07.226
Starcounter.Host.Loader: Internal handlers were registered.: 11.31.08.319

@per-samuelsson commented on Fri Apr 15 2016

Yeah, exactly, that's my conclusions as well.

One thing worth noticing about those is that this

Starcounter.Host: App domain configured.: 11.31.05.844
Starcounter.Host: Server command handlers registered.: 11.31.06.890

and this

Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 11.31.07.226
Starcounter.Host.Loader: Internal handlers were registered.: 11.31.08.319

seem to relate to registration of handlers.

I'll review what happens when the "apps bootstrapper" initialize, to see if also include that. Then we might get 3 for the price of 1 (if they are "fixable").

@alemoi commented on Fri Apr 15 2016

I am adding more log entries to have more precision.

@per-samuelsson commented on Fri Apr 15 2016

I am adding more log entries to have more precision.

OK cool, I was about to do that, but I'll happily let you. :smile:

I'll review what happens when the "apps bootstrapper" initialize, to see if also include that.

Yeah, seem quite a lot of handle registration is going on there too, so it's likely the underlying barrier. But I guess that involves some kind of code generation, so maybe it's not much we can do about it.

@alemoi commented on Fri Apr 15 2016

OK cool, I was about to do that, but I'll happily let you.

I just pushed 2 more, you can now add yours :)

@alemoi commented on Fri Apr 15 2016

Yeah, seem quite a lot of handle registration is going on there too, so it's likely the underlying barrier. But I guess that involves some kind of code generation, so maybe it's not much we can do about it.

I think its might be the handlers registration with gateway that takes time. So I will have to look into it, if I can introduce some optimizations.

@alemoi commented on Fri Apr 15 2016

I added a new measurement to get average simplest handler registration time which is around 0.7 ms(!) (after some optimizations though :smiley_cat:) on my machine, so it should not be the problem (added a test so now we measure this: SimplestHandlerRegistrationMs). However, digging deeper, since I think something is still wrong with handlers registration or some magic there!

@alemoi commented on Sat Apr 16 2016

Just finished investigating. Looked everywhere. You would not believe what was that :smiley_cat: It was "localhost" name resolution inside codehost to communicate REST with gateway. When using IP "127.0.0.1" - it solves all the above problems except this one (which we should dig deeper):

Starcounter.Host: Database configured.: 11.31.02.837
Starcounter.Host: Database connected.: 11.31.05.809

I made that "localhost" when using Http class is translated to "127.0.0.1", so one still can contine using Http class as before, no need to modify. However, if someone would like to use arbitrary .NET API for RESTing - now you know that "localhost" DNS adds around 1 second.

@per-samuelsson can you please pull and confirm that its also fixed on your machine? Thank you!

@dan31 commented on Sat Apr 16 2016

@alemoi wow, amazing for me is the way you discovered it in real life :8ball: Frankly speaking, I think we've already discussed such phenomena a couple years ago in a completely different context, and it still strikes back. There are numerous posts on the Internet on how to "fix" it on Windows, e.g. http://cubicspot.blogspot.se/2010/07/fixing-slow-apache-on-localhost-under.html. Thus we should recommend to apply such fixes to our clients, or even include hosts optional patch in installer.

@per-samuelsson commented on Mon Apr 18 2016

@per-samuelsson can you please pull and confirm that its also fixed on your machine? Thank you!

I sure will. Great work finding that, and fixing it.

@k-rus commented on Mon Apr 18 2016

However, if someone would like to use arbitrary .NET API for RESTing - now you know that "localhost" DNS adds around 1 second.

I am wonder: is it once per application start up or was it for each handler?

@per-samuelsson commented on Mon Apr 18 2016

I sure will. Great work finding that, and fixing it.

I can confirm @alemoi's fixes sliced of basically all time from those 3 spots I reported earlier. Well done, @alemoi! I will post some numbers later.

Right now, I'm working on trying to find whatever new spots that are interesting in managed code, so that we try shave those of before we go to the real beast, connecting the database, where we'll need to do unmanaged profiling to get somewhere.

I have at least one new spot that I'm currently analysing that seem interesting, and when I have more info on it, I'll update here with more general information about where we are right now.

@dan31 commented on Mon Apr 18 2016

@per-samuelsson, how much in now shave off in total, is it like 2x faster? (probably -3 secs)

@per-samuelsson commented on Mon Apr 18 2016

@per-samuelsson, how much in now shave off in total, is it like 2x faster? (probably -3 secs)

I asked for a little more time to properly process the raw data and come with an update, but your curiosity just can't wait, can it? :smile:

Give me 10 minutes, and I'll post some new reflections.

@per-samuelsson commented on Mon Apr 18 2016

OK, so first a sample of those three spots @alemoi addressed:

Starcounter.Host: Scheduler configured.: 10.07.56.408
Starcounter.Host: Apps bootstraper initialized.: 10.07.56.552

Starcounter.Host: App domain configured.: 10.08.00.394
Starcounter.Host: Server command handlers registered.: 10.08.00.411

Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 10.08.00.829
Starcounter.Host.Loader: Internal handlers were registered.: 10.08.00.850

With that, they are now well under the radar, and are clearly acceptable.

I've made numerous samples now, doing a few tweaks too, but things are now more or less status quo at showing an overall time of 8,5 - 10 seconds, normally around 9-ish. That, again, include:

Compiling source on the fly (done by star.exe) (~1 second)
Starting scdata (-)
Starting scdblog (-)
Starting sccode (~5-6 seconds)
Weaving (~1 second)
Bootstrapping application in host (~1 second)

Within booting up the code host, we now have these weak spots:

Starcounter.Host: Database configured.: 10.07.56.559
Starcounter.Host: Database connected.: 10.08.00.347

Starcounter.Host.Loader: Query module schema information updated.: 10.08.00.513
Starcounter.Host.Loader: System type specifications initialized.: 10.08.00.578
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 10.08.00.829

All other micro-tasks withing the code host bootstrap is now showing numbers where I'm not considering them interesting to go further with. Possibly, that last 0,3 second triplet is not very interesting either. I just wanted to show the one that was "second worse".

One new interesting thing I found was that it's always ~ 1 second between the last entry written by the code host bootstraping sequence, and when the admin server end that task (the task is "waiting for code host to be ready". After some initial diagnosing of that, I've now isolated it down to one thing added rather recently. It's an HTTP call from within the admin server that fires whenever the internal state of that change, and I remember I've asked @andwah about it earlier, and that I had some discussion with @andwah and @alemoi about it, and if it could be done differently. I think what it does is some kind of pushing out a "state update" to clients (like web admin). If I understand it correctly, that call is synchronous, and hence the admin server can't report back to star.exe that bootstraping of the database is done before that call complete:

Http.PUT("http://localhost:" + StarcounterEnvironment.Default.SystemHttpPort + "/__internal_api/databases/" + database.Name, string.Empty, null);

I think thats the best shot we got to shave of another second right now, since it seems like the most low-hanging fruit. After that we only have database connecting left until we've reached the goal that weaver should be the bottleneck.

@dan31 commented on Mon Apr 18 2016

@per-samuelsson already an achievement, thanks for an update! Yes I'm very curious :) What do you think would be the best approach to drill down into starting sccode 5 secs?

@per-samuelsson commented on Mon Apr 18 2016

What do you think would be the best approach to drill down into starting sccode 5 secs?

Well, it's hard to say, because now we are getting closer to have only that Database connected thing left, and it takes like > 75% of the overall time. I'm not updated on what is done in there (I know recovery runs, but I've seen that doesn't make a big impact). And also, because of that, I'm having even harder to understand what things can be optimized in there, and those that can not.

My idea is that once I get a better idea of this more low-hanging thingy I described above, it's time to take on that database connect.

@per-samuelsson commented on Mon Apr 18 2016

An update on this call:

Http.PUT(
  "http://localhost:" + StarcounterEnvironment.Default.SystemHttpPort + 
  "/__internal_api/databases/" + database.Name, string.Empty, null);

I've now ruled out that it has anything to do with "plumbing", or the call context itself. I've tested to replace it with an explicit, precreated Node instance, and changed to "127.0.0.1" when creating that, and to no prevail. It still is very close to ~ 1 second every time, so it seem to be something with the implementation.

@per-samuelsson commented on Mon Apr 18 2016

I've now ruled out that it has anything to do with "plumbing", or the call context itself.

Hmm, seems I have to take a step back from that already. Now I have a sample where it's ~ 1 second from the time the PUT is called, and until the handler is reached. It seem to be some kind of cold starting thingy though, since consecutive calls perform in just milliseconds.

@per-samuelsson commented on Mon Apr 18 2016

Now I have a sample where it's ~ 1 second from the time the PUT is called, and until the handler is reached. It seem to be some kind of cold starting thingy though, since consecutive calls perform in just milliseconds.

More info:

If I precreate the node (in some other thread probably), and keep a reference to it, I can get all calls down to milliseconds, except the first one, that always clock in at ~ 1 second.

// Precreate
node = new Node("127.0.0.1", 8181);

// Use later in another thread
Trace("Updated", watch);
node.PUT("/__internal_api/databases/" + database.Name, string.Empty, null);
Trace("After PUT", watch);

Trace("Updated2", watch);
node.PUT("/__internal_api/databases/" + database.Name, string.Empty, null);
Trace("After PUT2", watch);

Starcounter.Server: Entering update: 04.07.46.289
Starcounter.Server: Updated: 04.07.46.295
ServerManager: Enter PUT: 04.07.47.219  // First line in the actual handler, a second later
ServerManager: Got database: 04.07.47.221
ServerManager: Invalidated model: 04.07.47.224
Starcounter.Server: After PUT: 04.07.47.227
Starcounter.Server: Updated2: 04.07.47.230
ServerManager: Enter PUT: 04.07.47.233 // First line in the actual handler, now just 3 ms
ServerManager: Got database: 04.07.47.235
ServerManager: Invalidated model: 04.07.47.239
Starcounter.Server: After PUT2: 04.07.47.242

@per-samuelsson commented on Wed May 04 2016

Took a new iteration on this today, taking off from previous findinds, and this time with the aim to catch what's up with:

Starcounter.Host: Database configured.: 10.07.56.559
Starcounter.Host: Database connected.: 10.08.00.347

By installing an unmanaged profiler and doing mixed-mode debugging, I've finally been able to produce some results (let's just hope they are accurate).

As stated earlier, like > 80% of current bootstraping cycle happens when booting up an empty code host and specifically when invoking Bluestar sccoredb.sccoredb_connect. I've profiled that a couple of times now, and the results I get are pretty consistent. Here are two extracts:

The setup is a locally built Bluestar debug configuration, so absolute timings should be considered in that context, but maybe the relative numbers could give us a hint on where to look further.

I'll try to build a local release version of the kernel too, to see what that gives. And see if I can dig deeper into the tree to be even more specific. I'll get back on that.

Big thanks to @chrhol that helped me setup a build environment for Bluestar, baring with me when I yanked about everything being so difficult and complicated. And @dan31 that suggested VS Enterprise profiler, which worked out great.

@per-samuelsson commented on Wed May 04 2016

I had a feeling the above might be false diagnostics, and I think they might be. So if someone's reading this, give me a moment to test with a sharp Bluestar version first.

@alexanderstc commented on Wed May 04 2016

Testing on level0 also gives me an impression the time is spent inside db connect. My 2 rusty öre.

@per-samuelsson commented on Wed May 04 2016

Testing on level0 also gives me an impression the time is spent inside db connect.

Thanks for sharing that. I also still think it's there, but just not spent like my spreadsheets showed, since it seem just not likely that __get_image_header_read_ptr could take up so much of that time. It just doesn't make sense.

@alemoi commented on Wed May 04 2016

I think its a code generation and linking using LLVM is what taking time. Just a pure guess.

@per-samuelsson commented on Wed May 04 2016

[...] since it seem just not likely that __get_image_header_read_ptr could take up so much of that time. It just doesn't make sense.

Seem the feeling I had might was on to something. Doing profiling on a locally built release version of Bluestar revealed something completely different:

And thats more in line what I suspected from the beginning, that some kind of compiling / code generation must be in play here. Which is kind of sad, b/c that's where it will hurt us the most I guess, not being in control over that code.

@per-samuelsson commented on Wed May 04 2016

I think its a code generation and linking using LLVM is what taking time. Just a pure guess.

We think alike. :smile:

@per-samuelsson commented on Wed May 04 2016

A few things now are occupying me:

Why did LLVM not even surface in debug version profiling?
What's up with that 1 second it take every time for the first Self/Node/Http call (described earlier). I'll try to profile that too, while I'm already at it.

About LLVM, I have no clue what options we have to do anything. In my latest tests (using about time booting script), the isolated time to run through code host only bootstrapping (i.e. from beginning of Main until we are done processing initialization) is ~ 3,5 seconds in an optimized build. Out of that, above database connect is like > 90%, so other optimizations I just can't find in that sequence right now.

@alemoi commented on Wed May 04 2016

Why did LLVM not even surface in debug version profiling?

Maybe because its an external call to a DLL?

What's up with that 1 second it take every time for the first Self/Node/Http call (described earlier). I'll try to profile that too, while I'm already at it.

Hmm, yes, indeed interesting. I can check this once have time.

@per-samuelsson commented on Wed May 04 2016

Maybe because its an external call to a DLL?

Well, why would it show up in release in such case? It's just different builds.

Hmm, yes, indeed interesting. I can check this once have time.

It's superfrustrating actually, because I can constantly show that 1 second penalty by emitting before/after calls to the diagnostics text file. But as soon as I attach debugger/profiler, the call take "no time". :game_die: :anguished:

@alemoi commented on Wed May 04 2016

It's superfrustrating actually, because I can constantly show that 1 second penalty by emitting before/after calls to the diagnostics text file. But as soon as I attach debugger/profiler, the call take "no time". :game_die: :anguished:

Maybe the delay takes place in gateway? And the reason for this is again LLVM codegen there? @per-samuelsson you can leave this one for me. I'm pretty sure its code generation delay in gateway.

@per-samuelsson commented on Wed May 04 2016

@per-samuelsson you can leave this one for me. I'm pretty sure its code generation delay in gateway.

Really? OK, sure. I'm starting to get a feeling there is little I can do here once these analysis have been completed anyway, so if you think I should pause for now, I'll do so. Let me know if I can help you investigate in any way further!

Also, would be nice to also find out the catastrophic absolute numbers reported by @ivarbergman earlier in this thread and by @atifwaqar in #3494. They just seem really really off. Maybe I can do something there to help those guys better profile...

@dan31 commented on Thu May 05 2016

@per-samuelsson you can leave this one for me. I'm pretty sure its code generation delay in gateway.

@alemoi, if what you guess will turn out to be a case, then we are bounced into a big need of making a more controlled compilation logic.

Here is a simple version of it. Namely: 1) When developers are in dev env, the LLVM compiler is set to fast compilation without strong optimisation. Result is fast dev cycle. Devs get their apps to run in a blink of an eye. 2) When developers put app in prod env, they turn a switch in DB conf file to let full-blown strongly optimised compilation for their code. Not suitable for dev cycle on dev machine, but need to switch on for load testing / prod use.

Actually, points 1 and 2 are no more than this:

In that sense, we are no special. Developers don't wait for a release version to be built every time they make a change, since it is expensive. Well, I'm not in favour of such manual switching, in my mind this is too 2000-sh, cumbersome (especially for our target audience willing to do super-fast and super-simple development), and there is a better approach.

Let me give you an analogy. For instance, in JVM, when you compile and run your code first time, it isn't yet strongly optimised (hence starts relatively fast). However, it gets better optimised over execution. The VM finds hot spots on the execution pathes and rebuilds them at moment while they aren't in use. That is why its VM is called HotSpot. Somewhat similar happens in modern SQL processing engines. They do not attempt to globally optimise the query when the user runs it first time (to avoid letting them wait a way too long), instead they optimise an execution plan during the actual execution of the query.

In our case, the state of affairs is even simpler. We might build compiled representations (to whatever - queries, URIs and so on) without strong optimisations at first place. Then, already when the app is started, we can start recompiling the representations with strongest possible optimisations. Think of it as two-level cache. While strongly optimised functions aren't yet in cache, the weakly-optimised are invoked. On the availability of strongly optimised representations (ostensibly which will actually take these seconds it takes them to compile now) the references in cache are (thread-safely) replaced to those. Hence you have no overhead of switching to them (simply you had one pointer, now you have another).

Thus, while now we (ostensibly - it needs to be verified) force-feed the user for seconds with longer compilation any time he starts an app, after the suggested addition this time will move to (a bit longer) warm-up phase, not noticable by an applied user in 99.9% of cases.

Also, would be nice to also find out the catastrophic absolute numbers reported by @ivarbergman earlier in this thread and by @atifwaqar in #3494. They just seem really really off. Maybe I can do something there to help those guys better profile...

@per-samuelsson That is a good point. @atifwaqar, maybe you can spend some time to get us the startup issue reproduced? What do you think?

@atifwaqar commented on Fri May 06 2016

@dan31 Sure, I can help. maybe 1 way is to share the DB from my slow server? or whatever you guys suggest.

@dan31 commented on Fri May 06 2016

@atifwaqar Would be best if you can prepack your whole setup (use 7z Ultra compression, include all Starcounter folder from Documents, all apps that you start and a batch file you use to start them) so that @per-samuelsson (or anyone like @alemoi) can reproduce the run you do to observe slow start. Henceforth we can use this laboratory as a base to measure impact of improvements.

@atifwaqar commented on Fri May 06 2016

@dan31 Sure, the Starcounter folder from Documents is 30+ GB. I am copying it to our SC Basement server, so your team can easily access it. I will update you as soon as its copied.

@per-samuelsson commented on Sat May 07 2016

@dan31

We might build compiled representations (to whatever - queries, URIs and so on) without strong optimisations at first place.

I really like the ideas you present in that post. It gives us best-of-all: fast startup times, yet fast execution of critical code. Let's just hope it hold up, so it's not super-big or challenging in terms of implementation time.

@dan31 commented on Sat May 07 2016

Big thanks to @chrhol that helped me setup a build environment for Bluestar, baring with me when I yanked about everything being so difficult and complicated. And @dan31 that suggested VS Enterprise profiler, which worked out great.

There is an interesting alternative, tried it out today: https://github.com/GPUOpen-Tools/CodeXL/releases (ex-commercial AMD Profiler). It comes for free now and nicely integrates in VS. Yes it is optimised for AMD hardware, but it has nicely done sampling analysis (independent on CPU). Apart from all standard sampling data for native code, it also gives you assembly decomposition for each codeline, which is sometimes the most valuable part in the analysis as you can drill down deep:

@dan31 commented on Sat May 07 2016

It gives us best-of-all: fast startup times, yet fast execution of critical code. Let's just hope it hold up, so it's not super-big or challenging in terms of implementation time.

Important to note, if @alemoi confirms what we think is a case, then this work is an absolute need to any future versions (not just a patch for existing stuff), since the dominance of llvm time will still prevail unless it is solved in smart caching manner.

@Starcounter-Jack commented on Sat May 07 2016

Just my 2 cents. If the below is already done, please disregard.

Making the LLVM faster?

Before implementing the cache, I suggest we try the following:

When doing the JITting, turn of all optimizations.
Never use linking. This is probably how it works already (this is how the URI matcher is currently implemented).
If you need to call Bluestar functions, this should be done by writing the pointers to the functions from the host (again, _no_ linking using a linker). Always use CLang/LLVM inmemory.
Never load header files (or touch file file system). The C source should be in a single Clang input buffer and should only contain the function declarations and code actually used. This is probably how it works already (this is how the URI matcher is currently implemented).

@dan31 commented on Sat May 07 2016

@Starcounter-Jack very good cents! Not currently sure of all them right now (probably missed something obvious while making llvm in-memory from disk-generated), but @alemoi will consider your notes top prio when he comes back on Tuesday. Yes, currently (since some time) stronger optimisations are on to produce faster functions. On Friday @alemoi tested only with optim flag set to false, resulting in 3.5 faster generations. We don't know yet of the balance gen time / run time. I believe during next week we are going to have all info at hand and most likely an issue being resolved.

@dan31 commented on Sat May 07 2016

@Starcounter-Jack No file system is involved, llvm is completely in-memory for all its users. Most likely the reason is optimisations, and maybe some details on the full jit chain.

@dan31 commented on Thu May 12 2016

I've shortly ran .NET profiler in-vivo on SWO. Their suite takes 30 sec. to "Starting a database...". Which is, as we've noticed already, isn't about database. The proper message there should be "Starting Starcounter..." but not "Starting a database...", in my opinion. First I noticed that half of a CPU time over these 30 secs. took scnetworkgateway. Next time I restarted a suite I promptly attached the profiler to it. Here are the results:

Going deeper, it is clearly this:

My next candidate was sccode.exe, which also showed some peak on CPU usage approximately in the middle of the process, competing with the gateway for cpu resource. Here are the results of its profiling at a startup sequence:

Needless to continue investigating, I believe.

I think the severity of this unfortunate imprudence of llvm module usage is now obvious. Hence I safely assign the issue to @alemoi and raise its priority to high. I realise it isn't that simple to fix, since the reason of it is clearly that you wanted to make llvm generate more performant results, which resulted into an issue which users don't buy in. I believe the best quick fix, which will prevent our current users from having such big delays in dev cycle and suffering from it, is to introduce a database setting in conf file with default "no optimisation", as @Starcounter-Jack suggests. Putting their stuff in production, they can swtich "stronger optimisations". The proper longer-term fix is to do a smart cache which gives unopt versions at first place and then humbly builds optimised versions transparently to the user. We need to make our users be comfortable with the technology, and enforcing them to wait for minutes in order to get runtime boost just every time isn't an option (don't know how much you'd gain, but most of them ain't doing high-frequency trading or another Facebook).

@per-samuelsson thank you for your deliberate examination, I believe the root of the problem is now clear enough. @per-samuelsson you could repeat the same procedure with @atifwaqar data file to observe which takes that long (perhaps exactly same thing).

@k-rus commented on Thu May 12 2016

Great investigation!

I believe the best quick fix, which will prevent our current users from having such big delays in dev cycle and suffering from it, is to introduce a database setting in conf file with default "no optimisation", as @Starcounter-Jack suggests. Putting their stuff in production, they can swtich "stronger optimisations".

How bad is the performance of Starcounter if the optimisation is off? This quick fix might have drawback for new users, who download Starcounter and test performance. I doubt they will do proper configuration. They will probably run Starcounter as it is. Is it something we should worry about?

May be the switch by default should be on and then we teach our customers to switch it off in the development environment? Note that it is easy to forget to fix configuration in production environment.

@dan31 commented on Thu May 12 2016

How bad is the performance of Starcounter if the optimisation is off?

Thank you @k-rus, It is very important point to track. I call us up to be very concise in such changes (and @alemoi in particular). The change itself is just a flag, but we need to clearly get here two sets of numbers as a table summary: 1) What is the decrease in startup time? In which examples and tests? (Empty db? TPC-C and which settings? RetailDemo? Customer case?) 2) What is the increase in runtime time? In which examples and tests? (Empty db? TPC-C and which settings? RetailDemo? Customer case?)

@dan31 commented on Thu May 12 2016

This quick fix might have drawback for new users, who download Starcounter and test performance. I doubt they will do proper configuration. They will probably run Starcounter as it is. Is it something we should worry about?

I would say like this. Substantially less than people downloading Starcounter and waiting for 6-10 secs at a first app start, not mentioning current customers struggling from minutes of startup. I also have a gut feeling that the decrease in perf won't be times, but probably 10-s of percents. For our published benchmarks, we turn on all perf tunings. Once smart cache is in place, this isn't a concern any more.

Naturally, when newbies do their ad-hoc microbenchmarks, they look at these steps:

1) Ease of setup and first coding experience (put a tick here, "Hello world" and transacted apps are very straightforward today, in well-recognised environments like Visual Studio);

2) Ease of dev cycle (coding-running-stopping-changing-...): needs improvement for startup time (current issue) and stopping/restarting time (addressed in Core .NET track). People don't like to wait, if they wait and the platform looks as if it is hangs big time, they would incline to leave the platform. There are more smaller points to address: e.g. precreate first database at an installation in the async way, but not when the user starts his first app.

3) Perf of writes. They populate the database with loads of data, like inserting millions of arbitrary generated rows like people and houses. People don't like to wait, if they need to wait for minutes to load their test data they would incline to leave the platform. Today you load 1.000.000 people of FirstName LastName in around 1.5 secs. on my laptop on a battery, surely can be improved but decently okay already (we as a company take great care about write performance levels).

4) Perf of reads. When the data is there, they want to crunch it. We all know what we want to solve there and how, I'm not diving into detail. Once NewSQL is in-place and redundant .net onion flakes around getting data to the codehost is removed, the reads would improve. Once we add OLAP support, the reads would amazingly improve.

5) Perf of mixed loads. Only quite savvy users and real customers approach this stage. We talk about concurrent access to the database and operational analytics example, when you want to do writes OLTP and analyse your current data in the OLAP way simultaneously. This is important when Starcounter gets really big public traction and people start to really stress-test it with amusing workloads.

Effectively, the point we address here is N.2 in the list. If N.2 isn't addressed, the amount of newbies trying out Starcounter but reaching N.3 and N.4 will greatly decrease.

@bigwad commented on Thu May 12 2016

@alemoi, could you please add appropriate scllvm.pdb to source control?

@dan31 commented on Thu May 12 2016

@bigwad +1

@dan31 commented on Thu May 12 2016

May be the switch by default should be on and then we teach our customers to switch it off in the development environment? Note that it is easy to forget to fix configuration in production environment.

For the reasons I gave above, I think the most reasonable way to approach it is to provide the off flag by default and do work on smart cache asap and deprecate a flag. Our goal for Core .NET version is to achieve sub-second dev cycle for both start and stop, whilst it will be the same scllvm.dll involved into a loop. Hence, if kept as is and flag is on, it will never be possible to achieve sub-second experience.

However, it might turn out that the root reason isn't optimisation itself, but, for instance, heavy heap trashing or dealloc times per each function jitted when invoking llvm.dll. We don't know yet, @alemoi is for a rescue in the issue.

@k-rus commented on Thu May 12 2016

Naturally, when newbies do their ad-hoc microbenchmarks, they look at these steps:

It is great to see this list being written down.

@dan31 commented on Thu May 12 2016

It is great to see this list being written down.

Actually this is only when there is a performance-related evaluation. The other big part of evaluation is the app dev framework, viewmodels with thin clients and all related to app development. When they try and test these (maybe it will be 70-80% of initial evaluation or even more, and this version we use in starting tutorials), compiling high-performant version is quite useless. However they might cope it with data generation and fetching (like trying to create analytical application with GUI), but this still goes before p. on writes and reads - they first create an app with no data, but with some GUI and events responses like button clicks or alike.

@dan31 commented on Thu May 12 2016

@k-rus @alemoi Another point, looking at a cap of 20% cpu load on a profiling session for both sccode and scnetworkgateway I conclude that llvm gen is run on a single thread. Maybe running this in 4 threads or alike might improve it 1.5-2 times. However, this is still to be 10 secs instead of 20 secs.

@alemoi commented on Thu May 12 2016

@alemoi, could you please add appropriate scllvm.pdb to source control?

Sure. @eriohl, since you added the last scllvm.dll, do you still have the corresponding .pdb by chance?

@alemoi commented on Wed May 18 2016

So, after I added flags to disable optimizations for LLVM here are the results we have:

With optimizations on:

Gateway URI matcher codegen - Total codegen time (169, 8181): 984 ms.
Codehost start - Starcounter.Host: Database configured.: 03.58.49.250 Starcounter.Host: Database connected.: 03.58.51.424

With optimizations off:

Gateway URI matcher codegen - Total codegen time (169, 8181): 343 ms.
Codehost start - Starcounter.Host: Database configured.: 04.05.18.309 Starcounter.Host: Database connected.: 04.05.19.797

I used the same great timeboot from @per-samuelsson which measures warm start.

So as we can see when it comes to URI matcher generator, the difference is about 3 times, which is good. However, the database configuration still takes 1.5 seconds! Instead of 2.2. Still a lot for just a database configuration step. I guess, we again have to dig deeper in database configuration native code.

@dan31 regarding your research, most probably its the CPU time that your tool measured, not the actual calendar time. LLVM eats CPU well because its a linear algorithmic code without syncronization and so on, that's why its shown to eat a lot of CPU.

I will now run TPCC without optimizations to see the difference in performance degradation.

@dan31 commented on Wed May 18 2016

@alemoi, one question, which app start do you actually measure? In my case this is a set of real apps, with lots of real URIs and queries. Regarding the sampling data, I was assuring visually that waiting was actually correlated to CPU work indicated on a diagram. Namely, these 20 seconds were not just something, but the CPU work capped at 25%, the work being concentrated in scllvm. I might be wrong, though I currently don't see any other particular source of slowness: I see CPU working while database is starting. Ha, actually, @alemoi, if you just send me a pdb file, I would gladly dig deeper on that matter. I can still tease these production machines. Can you do that?

@alemoi commented on Wed May 18 2016

For example, TPCC long test is taking 1:47sec VS 1:59sec.

I will turn off usage of llvm completely in codehost (fallback into a previous system with DLLs) and see how it goes with perf.

@dan31 commented on Wed May 18 2016

@alemoi that is also an interesting thing to try out. But really, point me to the pdbs, I'm eager to push it further on on these machines.

@alemoi commented on Wed May 18 2016

I need to build a new version of scllvm, pulling the sources now. Had to clean the hard-drive of not enough space. Will try to produce a new scllvm.dll/pdb tonight.

@alemoi commented on Wed May 18 2016

Here is what happens when I fall back into previous codegen DLLs usage:

Starcounter.Host: Database configured.: 10.52.31.894
Starcounter.Host: Database connected.: 10.52.32.769

so its seems to be an issue with LLVM. My guess there simply a difference in time for opt/nopt generation for different types of code: URI matcher and SQL code. That's why we don't see 3 times less time in case of SQL code.

@alemoi commented on Thu May 19 2016

@alemoi that is also an interesting thing to try out. But really, point me to the pdbs, I'm eager to push it further on on these machines.

I have recovered my llvm building environment, with their latest version 3.8.0. Of course, they have changed some interfaces and libraries :) I adopted the code and tested its basic work. Uploaded new scllvm.dll and pdb to \\scbuildserver\ftp\temp\am\scllvm\

cc @bigwad if you are interested.

@per-samuelsson commented on Thu May 19 2016

Great work @alemoi. Looks like we are slowly (!) getting closer to improve on this. :+1:

@alemoi commented on Thu May 19 2016

I wonder if its possible to move this issue to a private repo. Its getting very internal here :)

@ivarbergman commented on Wed Sep 28 2016

HI! I just want to check with you if there are any progress (or new findings) on this issue? We are still suffering from slow database & application startup. About 30 sec for each app.

Some of our servers have 10 database with 15 apps each. Which means that restarting the services is a 1+ hour process.

We use Azure A2 instances and using starcounter version Starcounter-DevelopNightlyBuilds-2.2.1.3137 (Sep 13, 2016 2:43:12 AM)

Thanks!

@alemoi commented on Mon Oct 17 2016

To inform, in develop there is now a new working scllvm caching schema which should improve startup time. Hopefully soon it will be introduced in pnext as well.

@dan31 commented on Mon Nov 14 2016

@ivarbergman could you please provide your typical start/re-start script for you set of apps? This is currently not related to compile times (those are actually shaved off with new llvm subsystem). Using your case, we need to investigate it further.

@ivarbergman commented on Tue Jan 10 2017

Hi! We have had input from @dan31 about how to do restart apps outside of this issue. (I.e. stop the sc host before we restart apps.) Thanks!

But we stil have rather slow start up even if we do this.

Locally (on my dev machine) I noticed that the Temp folder are very large. Temp/weaver 8GB and Temp/sqlschemas 5GB and when I created a new database (reusing the sci-files) the Temp folders are much smaller and startup times was much better.

Question: Do you think that large Temp folder can result in slower startup? And can the Temp folder be emptied safely?

@per-samuelsson commented on Tue Jan 10 2017

But we stil have rather slow start up even if we do this.

Sorry to hear that. Let's see if we can have some brainstorming session and try understand why.

Question: Do you think that large Temp folder can result in slower startup?

I don't think it should, no.

And can the Temp folder be emptied safely?

In theory, it should be safe. But I have a vague memory where I once dropped some folders in there and it messed something up. There are issues in our backlog that describe stuff we should do to better keep them under control. Let me get back on that.

@ivarbergman commented on Wed Jan 11 2017

FYI: we are still running SC 2.2 but we will soon switch to 2.3. So maybe we shouldn't do too much investigations until we have made the upgrade.

@per-samuelsson commented on Wed Jan 11 2017

There are issues in our backlog that describe stuff we should do to better keep them under control. Let me get back on that.

Was #1690 I was thinking of. Maybe we need to bump that.

@per-samuelsson commented on Wed Jan 11 2017

FYI: we are still running SC 2.2 but we will soon switch to 2.3. So maybe we shouldn't do too much investigations until we have made the upgrade.

I see. Thanks for sharing.

Should be pretty easy to test if its safe to delete content under Temp at least. Create a new database, start desired apps in that. Then stop the database, delete everything under Temp (don't delete the Temp\[Database] folder in itself) and restart the apps. If that works, it's safe.

Something like this:

// 1. Create database
staradmin -d=foo new db

// 2. Start app suite
// For each app you want to test:
star -d=foo app.exe

// Then when all are started
staradmin stop db

// Go delete everything under Temp\Foo
// The rerun step 2. If apps start correctly,
// its a go.

@miyconst commented on Wed Jan 11 2017

FYI: we are still running SC 2.2 but we will soon switch to 2.3.

@ivarbergman please try the StarDump.Migrator.2.2 app for this purpose.

Starcounter / Home

Starting database is slow #52

Making the LLVM faster?