Around 20 sec on our production server (Azure A2) . approx 10 sec on my local machine.
Thanks @ivarbergman. We should really try to improve those numbers.
Here's a script that could be used:
timeboot.bat
@echo off
set db=%1
IF NOT EXIST app.cs (
staradmin +
)
staradmin kill all
staradmin start server
set start=%TIME%
star -d=%db% --logsteps app.cs
set stop=%TIME%
ECHO Frame: %start% - %stop%
Here's a sample run from me, on a locally built version:
Logs (since 12:33:46.0400767)
-----------------------------
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
143.9936:Package processed.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
143.9847:Processing completed.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
143.9597:Entry point executed.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
136.2446:CLR view meta-data were populated for the given types.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
86.4498:Query module schema information updated.
[12:34:08, Debug, Sql (foo123)]
Exported schema info: C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234085032114738359.pl
[12:34:08, Debug, Sql (foo123)]
Exported SQL schema info for foo123 to C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234085032114738359.pl in 0 ms.
[12:34:08, Debug, Sql (foo123)]
Exported schema info: C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234083831463698039.pl
[12:34:08, Debug, Sql (foo123)]
Exported SQL schema info for appfoo123 to C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234083831463698039.pl in 16 ms.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
25.2593:Columns of database schema checked and updated.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
25.2498:Type definitions registered.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
25.1412:Types of database schema checked and updated.
[12:34:08, Debug, Sql (foo123)]
Exported schema info: C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema160408123408362693170111.pl
[12:34:08, Debug, Sql (foo123)]
Exported SQL schema info for appfoo123 to C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema160408123408362693170111.pl in 0 ms.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
19.6484:Package started.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
18.9861:Package created.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
18.9331:Target assembly loaded.
[12:34:08, Debug, Starcounter.Host.AssemblyResolver (foo123)]
Loading assembly app, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null from C:\Users\Per\Documents\Starcounter\Personal\Temp\foo123\foo123-20160408T110756165\weaver\409047624C1F61434671885EEC91BC148836103B\app.exe
[12:34:08, Debug, Starcounter.Host.AssemblyResolver (foo123)]
0.0:Loading assembly app, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null from C:\Users\Per\Documents\Starcounter\Personal\Temp\foo123\foo123-20160408T110756165\weaver\409047624C1F61434671885EEC91BC148836103B\app.exe
[12:34:08, Debug, Starcounter.Host.AssemblyResolver (foo123)]
0.0:Resolving application: C:\Users\Per\Documents\Starcounter\Personal\Temp\foo123\foo123-20160408T110756165\weaver\409047624C1F61434671885EEC91BC148836103B\app.exe
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
13.3552:Schema verified and loaded.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
2.5785:Input verified and assembly resolver updated.
[12:34:08, Debug, Starcounter.Host.Loader (foo123)]
0.296:Loader started.
[12:34:07, Debug, Starcounter.Server.Processes (scadminserver)]
Starting ""C:\Users\Per\Git\Starcounter\Develop\Level1\bin\Debug\scweaver.exe" --maxerrors=1 --ErrorParcelId=A4A7D6FA-EB34-442A-B579-DBB1DBB859E3 Weave "C:\Users\Per\AppData\Local\Temp\5eb1d56c-b830-446d-96d0-b31d3918216e\app.exe" --outdir="C:\Users\Per\Documents\Starcounter\Personal\Temp\foo123\foo123-20160408T110756165\weaver\409047624C1F61434671885EEC91BC148836103B"" at 12.34.07.265
[12:34:05, Debug, Starcounter.Host (foo123)]
4611.8811:Booting completed.
[12:34:05, Debug, Starcounter.Host (foo123)]
4611.6293:Start completed.
[12:34:05, Debug, Starcounter.Host (foo123)]
4611.6010:Base package loaded.
[12:34:05, Debug, Starcounter.Host.Loader (foo123)]
4564.6801:Package processed.
[12:34:05, Debug, Starcounter.Host.Loader (foo123)]
4564.5596:Processing completed.
[12:34:05, Debug, Starcounter.Host.Loader (foo123)]
4564.4908:Internal handlers were registered.
[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4115.1704:CLR view meta-data were populated for the given types.
[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4066.379:System type specifications initialized.
[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4053.2076:Query module schema information updated.
[12:34:03, Debug, Sql (foo123)]
Exported schema info: C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234031061266764082.pl
[12:34:03, Debug, Sql (foo123)]
Exported SQL schema info for foo123 to C:/Users/Per/Documents/Starcounter/Personal/Temp/foo123/foo123-20160408T110756165/sqlschemas/schema1604081234031061266764082.pl in 0 ms.
[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4047.9946:Columns of database schema checked and updated.
[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4047.6953:Type definitions registered.
[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4047.2666:Types of database schema checked and updated.
[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4045.7563:Properties and columns were populated for the given meta-types.
[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4044.624:CLR view meta-data were deleted on host start.
[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4033.2472:Runtime meta-data tables were created and populated with initial data.
[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4031.5012:Package started.
[12:34:03, Debug, Starcounter.Host.Loader (foo123)]
4030.4530:Package created.
[12:34:03, Debug, Starcounter.Host.AssemblyResolver (foo123)]
0.0:Assembly resolver created in process 14272
[12:34:03, Debug, Starcounter.Host (foo123)]
4074.5988:Server command handlers registered.
[12:34:02, Debug, Starcounter.Host (foo123)]
3817.2832:App domain configured.
[12:34:02, Debug, Starcounter.Host (foo123)]
3817.2457:Network gateway connected.
[12:34:02, Debug, Starcounter.Host (foo123)]
3817.1450:Scheduler started.
[12:34:02, Debug, Starcounter.Host (foo123)]
3816.9241:Setup completed.
[12:34:02, Debug, Starcounter.Host (foo123)]
3816.8600:Query module initiated.
[12:34:01, Debug, Starcounter.Host (foo123)]
3811.1833:Database connected.
[12:34:01, Debug, Starcounter (foo123)]
Starting log writer at address 995. Next commit id is 496.
[12:34:01, Debug, Starcounter (foo123)]
Recovery completed. 12 transactions recovered.
[12:34:01, Debug, Starcounter (foo123)]
Starting recovery at address 719. Next commit id is 484.
[12:33:48, Debug, Starcounter.Host (foo123)]
374.5935:Database configured.
[12:33:48, Debug, Starcounter.Host (foo123)]
373.8425:Apps bootstraper initialized.
[12:33:47, Debug, Starcounter.Host (foo123)]
86.429:Scheduler configured.
[12:33:47, Debug, Starcounter.Host (foo123)]
65.5774:Host configured.
[12:33:47, Debug, Starcounter.Host (foo123)]
65.2494:BMX manager initialized.
[12:33:47, Debug, Starcounter.Host (foo123)]
63.6016:Logging configured.
[12:33:47, Debug, Starcounter.Server.Processes (scadminserver)]
Begin monitoring code host process sccode, PID 14272, running database foo123
[12:33:47, Warning, Starcounter (foo123)]
Image file writer unable to acquire the SE_MANAGE_VOLUME_NAME privilege.
Frame 12:33:45.84 - 12:34:10.00
This extract seem to be the most interesting:
[12:34:01, Debug, Starcounter.Host (foo123)]
3811.1833:Database connected.
[12:34:01, Debug, Starcounter (foo123)]
Starting log writer at address 995. Next commit id is 496.
[12:34:01, Debug, Starcounter (foo123)]
Recovery completed. 12 transactions recovered.
[12:34:01, Debug, Starcounter (foo123)]
Starting recovery at address 719. Next commit id is 484.
[12:33:48, Debug, Starcounter.Host (foo123)]
374.5935:Database configured.
I refined the script from above further, so we also produce a diagnostic time log, since traces routed to the log seem not always to turn up.
timeboot.bat
@echo off
set db=%1
IF NOT EXIST app.cs (
staradmin +
)
set SC_TIMESTAMPS_FILE_PATH=
IF EXIST timeboot.txt (
del timeboot.txt
)
staradmin kill all
set SC_TIMESTAMPS_FILE_PATH=%cd%\timeboot.txt
staradmin start server
:: We are not interested in results from booting up
:: admin server. Delete what was emitted up until now.
del timeboot.txt
set start=%TIME%
star -d=%db% --logsteps app.cs
set stop=%TIME%
set SC_TIMESTAMPS_FILE_PATH=
ECHO Frame: %start% - %stop%
ECHO Diagnostics also in timeboot.txt
And an extract from a timeboot.txt, somewhat prettyfied and with a few comments:
// 01.36.30.709 Admin server recieve request to boot up the database
Starcounter.Server.Commands: Executing 'Starting database foo123.' (0e99e927-9e8b-4b9e-ad36-1a084e454942): 01.36.30.709
Starcounter.Server.Commands: Begin task 'Starting database process': 01.36.30.714
Starcounter.Server.Commands: End task (cancelled=False): 01.36.30.745
Starcounter.Server.Commands: Begin task 'Starting log writer process': 01.36.30.748
Starcounter.Server.Commands: End task (cancelled=False): 01.36.30.760
Starcounter.Server.Commands: Begin task 'Starting code host process': 01.36.30.763
Starcounter.Server.Commands: End task (cancelled=False): 01.36.30.778
Starcounter.Server.Commands: Begin task 'Assuring code host is ready for service': 01.36.30.782
Starcounter.Host: Started sccode Main(): 01.36.30.807
Starcounter.Host: Bootstrap Main() started.: 01.36.30.825
Starcounter.Host: Exception factory installed.: 01.36.30.840
Starcounter.Host: Command line parsed.: 01.36.30.865
Starcounter.Host: Configuration loaded.: 01.36.30.870
Starcounter.Host: Assured no other process with the same name.: 01.36.30.874
Starcounter.Host: Global memory allocated.: 01.36.30.878
Starcounter.Host: Kernel memory configured.: 01.36.30.883
Starcounter.Host: Logging configured.: 01.36.30.902
Starcounter.Host: BMX manager initialized.: 01.36.30.908
Starcounter.Host: Host configured.: 01.36.30.912
Starcounter.Host: Scheduler configured.: 01.36.30.995
Starcounter.Host: Apps bootstraper initialized.: 01.36.32.131
Starcounter.Host: Database configured.: 01.36.32.137
Starcounter.Host: Database connected.: 01.36.46.230
Starcounter.Host: Query module initiated.: 01.36.46.253
Starcounter.Host: Setup completed.: 01.36.46.257
Starcounter.Host: Scheduler started.: 01.36.46.261
Starcounter.Host: Network gateway connected.: 01.36.46.265
Starcounter.Host: App domain configured.: 01.36.46.268
Starcounter.Host: Server command handlers registered.: 01.36.47.291
Starcounter.Host.Loader: Package created.: 01.36.47.310
Starcounter.Host.Loader: Package started.: 01.36.47.317
Starcounter.Host.Loader: Runtime meta-data tables were created and populated with initial data.: 01.36.47.327
Starcounter.Host.Loader: CLR view meta-data were deleted on host start.: 01.36.47.380
Starcounter.Host.Loader: Properties and columns were populated for the given meta-types.: 01.36.47.390
Starcounter.Host.Loader: Types of database schema checked and updated.: 01.36.47.399
Starcounter.Host.Loader: Type definitions registered.: 01.36.47.404
Starcounter.Host.Loader: Columns of database schema checked and updated.: 01.36.47.409
Starcounter.Host.Loader: Query module schema information updated.: 01.36.47.431
Starcounter.Host.Loader: System type specifications initialized.: 01.36.47.484
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 01.36.47.660
Starcounter.Host.Loader: Internal handlers were registered.: 01.36.48.700
Starcounter.Host.Loader: Processing completed.: 01.36.48.703
Starcounter.Host.Loader: Package processed.: 01.36.48.706
Starcounter.Host: Base package loaded.: 01.36.48.709
Starcounter.Host: Start completed.: 01.36.48.712
Starcounter.Host: Booting completed.: 01.36.48.716
Starcounter.Server.Commands: End task (cancelled=False): 01.36.50.657
// 01.36.50.660 Admin server is done booting database processes (~20 sec)
Starcounter.Server.Commands: Executing completed (0e99e927-9e8b-4b9e-ad36-1a084e454942): 01.36.50.660
// 01.36.50.815 Admin server is asked to start application
Starcounter.Server.Commands: Executing 'Starting app in foo123' (9ab8c1f9-8a0a-4a6d-b7f6-6da0116a37d0): 01.36.50.815
Starcounter.Server.Commands: Begin task 'Preparing user executables and files': 01.36.50.822
Starcounter.Server.Commands: Weaving completed.: 01.36.51.736
Starcounter.Server.Commands: End task (cancelled=False): 01.36.51.739
Starcounter.Server.Commands: Begin task 'Loading executable in code host': 01.36.51.742
Starcounter.Host.Loader: Loader started.: 01.36.51.834
Starcounter.Host.Loader: Input verified and assembly resolver updated.: 01.36.51.848
Starcounter.Host.Loader: Schema verified and loaded.: 01.36.51.915
Starcounter.Host.Loader: Target assembly loaded.: 01.36.51.941
Starcounter.Host.Loader: Package created.: 01.36.51.944
Starcounter.Host.Loader: Package started.: 01.36.51.950
Starcounter.Host.Loader: Types of database schema checked and updated.: 01.36.51.975
Starcounter.Host.Loader: Type definitions registered.: 01.36.51.978
Starcounter.Host.Loader: Columns of database schema checked and updated.: 01.36.51.981
Starcounter.Host.Loader: Query module schema information updated.: 01.36.52.252
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 01.36.52.453
Starcounter.Host.Loader: Entry point executed.: 01.36.52.487
Starcounter.Host.Loader: Processing completed.: 01.36.52.491
Starcounter.Host.Loader: Package processed.: 01.36.52.495
Starcounter.Server.Commands: Code host Exec request processed.: 01.36.52.502
Starcounter.Server.Commands: Database app registered.: 01.36.52.506
Starcounter.Server.Commands: Database status updated.: 01.36.53.526
Starcounter.Server.Commands: End task (cancelled=False): 01.36.53.533
// 01.36.53.538 Admin server has weaved and loaded application (~3 seconds)
Starcounter.Server.Commands: Executing completed (9ab8c1f9-8a0a-4a6d-b7f6-6da0116a37d0): 01.36.53.538
Since the majority of time spent is consumed when booting up an empty host, I'll focus on that initially, trying to find answers. Here's spots I've filtered out from that sample, all consuming more than 1 second, and sorted by their relative impact:
Booting code host (> 1 second gaps)
-----------------------------------
14 sec
Starcounter.Host: Database configured.: 01.36.32.137
Starcounter.Host: Database connected.: 01.36.46.230
2 sec
Starcounter.Host: Booting completed.: 01.36.48.716
Starcounter.Server.Commands: End task (cancelled=False): 01.36.50.657
1,5 sec
Starcounter.Host: Scheduler configured.: 01.36.30.995
Starcounter.Host: Apps bootstraper initialized.: 01.36.32.131
1 sec
Starcounter.Host: App domain configured.: 01.36.46.268
Starcounter.Host: Server command handlers registered.: 01.36.47.291
1 sec
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 01.36.47.660
Starcounter.Host.Loader: Internal handlers were registered.: 01.36.48.700
Note that absolute numbers shouldn't be trusted here, since I'm running a personal build that might contain some debug code. The focus is on relative numbers.
Doing some closer measurements didn't reveal anything new in the first iteration. Just as everything suggest above, the host "connecting" to the database is what steal almost all of the cycles.
Function Name Total CPU (%) Self CPU (%) Total CPU (ms) Self CPU (ms) Module
Bootstrap.Control::ConnectDatabase 93.12 % 0.00 % 16565 0 Starcounter.Bootstrap.dll
And that method is nothing more than this:
private unsafe void ConnectDatabase(uint schedulerCount, ulong hmenv, ulong hlogs) {
uint e;
uint flags = 0;
flags |= sccoredb.SCCOREDB_LOAD_DATABASE;
flags |= sccoredb.SCCOREDB_USE_BUFFERED_IO;
flags |= sccoredb.SCCOREDB_ENABLE_CHECK_FILE_ON_LOAD;
//flags |= sccoredb.SCCOREDB_ENABLE_CHECK_FILE_ON_CHECKP;
flags |= sccoredb.SCCOREDB_ENABLE_CHECK_FILE_ON_BACKUP;
flags |= sccoredb.SCCOREDB_ENABLE_CHECK_MEMORY_ON_CHECKP;
e = sccoredb.sccoredb_connect(flags, schedulerCount, hmenv, hlogs, sccorelib.fix_get_performance_counter_file_map());
if (e != 0) throw ErrorCode.ToException(e);
e = filter.init_filter_lib(hmenv, hlogs);
if (e != 0) throw ErrorCode.ToException(e);
}
I've recompiled experimenting with the flags too, but the impact is close to zero. So I guess the next step for me must be to start profiling Bluestar.
I CC @eriohl and @alemoi there to ensure they receive updates on that one (they've recently looked into starting a db standalone and no slowdown was noticeable, hence I believe they might add valuable insights for the current story).
[...] they've recently looked into starting a db standalone and no slowdown was noticeable
Thats kind of weird. I'm doing this on a local build, and one that bring down the Bluestar binaries from our build server, but maybe it's not a sharp version or something like that? I do remember that in the past, there was a huge difference between the Debug and Release builds in booting up (the former doing a bunch of checks on the page level or something like that).
On the other side, we've now seen reports about really slow boots from customers too, and they'll for sure have release builds. But then again, it might be they don't suffer from the exact same pain I'm suffering from.
I think I'll install a sharp version here and run it on that to make sure. Stay tuned. :smile:
I think that might be code generation for indexes that is very slow. It happened when I moved codegen to be in memory using Clang. Might that be true @eriohl?
I think I'll install a sharp version here and run it on that to make sure. Stay tuned.
When downloading and installing, numbers improved significantly on my environment, so that indicate that doing a local build of Level1 will bring down some not-optimized Bluestar binaries, which probably is the right thing to do.
WTBS, I still time around 10-15 seconds for booting up just an empty application. I'll go through my diagnostic logs and try to make a selection like the one above, but with this new data.
I think that might be code generation for indexes that is very slow. It happened when I moved codegen to be in memory using Clang. Might that be true @eriohl?
I think it is a lot of stuff that is very slow but this certainly hasn't helped any. But it seems not to be all that relevant since there is no real alternative here. And I note that we starting the host without the managed host it goes a lot faster and it also does the code generation (for indexes at least). So perhaps not the biggest culprit.
So here's a new extract, where I've cut off some edges that seem not very interesting. In this sample, the bootstrapping takes ~11 seconds, and include:
Compiling source on the fly (done by star.exe) (~1 second)
Starting scdata (-)
Starting scdblog (-)
Starting sccode (~8 seconds)
Weaving (~1 second)
Bootstrapping application in host (~1 second)
In the past, we've said that we can't be satisfied with the bootstrapping sequence until the weaver consumes up the majority of time of it, and I'm not sure that apply 100% in this sample, since the application is super-limited. But even if it's just like truth-ish, the ratios are pretty off as can be seen.
Here's the extract of the code host bootstrapper:
Starcounter.Server.Commands: Begin task 'Assuring code host is ready for service': 11.31.01.419
Starcounter.Host: Started sccode Main(): 11.31.01.444
Starcounter.Host: Bootstrap Main() started.: 11.31.01.461
Starcounter.Host: Exception factory installed.: 11.31.01.478
Starcounter.Host: Command line parsed.: 11.31.01.508
Starcounter.Host: Configuration loaded.: 11.31.01.513
Starcounter.Host: Assured no other process with the same name.: 11.31.01.518
Starcounter.Host: Global memory allocated.: 11.31.01.521
Starcounter.Host: Kernel memory configured.: 11.31.01.526
Starcounter.Host: Logging configured.: 11.31.01.547
Starcounter.Host: BMX manager initialized.: 11.31.01.553
Starcounter.Host: Host configured.: 11.31.01.557
Starcounter.Host: Scheduler configured.: 11.31.01.570
Starcounter.Host: Apps bootstraper initialized.: 11.31.02.824
Starcounter.Host: Database configured.: 11.31.02.837
Starcounter.Host: Database connected.: 11.31.05.809
Starcounter.Host: Query module initiated.: 11.31.05.831
Starcounter.Host: Setup completed.: 11.31.05.834
Starcounter.Host: Scheduler started.: 11.31.05.838
Starcounter.Host: Network gateway connected.: 11.31.05.842
Starcounter.Host: App domain configured.: 11.31.05.844
Starcounter.Host: Server command handlers registered.: 11.31.06.890
Starcounter.Host.Loader: Package created.: 11.31.06.911
Starcounter.Host.Loader: Package started.: 11.31.06.918
Starcounter.Host.Loader: Runtime meta-data tables were created and populated with initial data.: 11.31.06.928
Starcounter.Host.Loader: CLR view meta-data were deleted on host start.: 11.31.06.936
Starcounter.Host.Loader: Properties and columns were populated for the given meta-types.: 11.31.06.945
Starcounter.Host.Loader: Types of database schema checked and updated.: 11.31.06.954
Starcounter.Host.Loader: Type definitions registered.: 11.31.06.960
Starcounter.Host.Loader: Columns of database schema checked and updated.: 11.31.06.963
Starcounter.Host.Loader: Query module schema information updated.: 11.31.06.988
Starcounter.Host.Loader: System type specifications initialized.: 11.31.07.041
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 11.31.07.226
Starcounter.Host.Loader: Internal handlers were registered.: 11.31.08.319
Starcounter.Host.Loader: Processing completed.: 11.31.08.322
Starcounter.Host.Loader: Package processed.: 11.31.08.326
Starcounter.Host: Base package loaded.: 11.31.08.330
Starcounter.Host: Start completed.: 11.31.08.334
Starcounter.Host: Booting completed.: 11.31.08.339
Starcounter.Server.Commands: End task (cancelled=False): 11.31.09.771
When downloading and installing, numbers improved significantly on my environment, so that indicate that doing a local build of Level1 will bring down some not-optimized Bluestar binaries, which probably is the right thing to do.
So the biggest breaches appear to be:
Starcounter.Host: Database configured.: 11.31.02.837Starcounter.Host: Database connected.: 11.31.05.809
(3 sec) and
Starcounter.Host: App domain configured.: 11.31.05.844Starcounter.Host: Server command handlers registered.: 11.31.06.890
(1 sec), the rest is quite equally distributed into small portions of time (4-14 msecs mostly).
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 11.31.07.226
Starcounter.Host.Loader: Internal handlers were registered.: 11.31.08.319
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 11.31.07.226
Starcounter.Host.Loader: Internal handlers were registered.: 11.31.08.319
seem to relate to registration of handlers.
I'll review what happens when the "apps bootstrapper" initialize, to see if also include that. Then we might get 3 for the price of 1 (if they are "fixable").
I am adding more log entries to have more precision.
OK cool, I was about to do that, but I'll happily let you. :smile:
I'll review what happens when the "apps bootstrapper" initialize, to see if also include that.
Yeah, seem quite a lot of handle registration is going on there too, so it's likely the underlying barrier. But I guess that involves some kind of code generation, so maybe it's not much we can do about it.
Yeah, seem quite a lot of handle registration is going on there too, so it's likely the underlying barrier. But I guess that involves some kind of code generation, so maybe it's not much we can do about it.
I think its might be the handlers registration with gateway that takes time. So I will have to look into it, if I can introduce some optimizations.
I added a new measurement to get average simplest handler registration time which is around 0.7 ms(!) (after some optimizations though :smiley_cat:) on my machine, so it should not be the problem (added a test so now we measure this: SimplestHandlerRegistrationMs). However, digging deeper, since I think something is still wrong with handlers registration or some magic there!
Just finished investigating. Looked everywhere. You would not believe what was that :smiley_cat:
It was "localhost" name resolution inside codehost to communicate REST with gateway. When using IP "127.0.0.1" - it solves all the above problems except this one (which we should dig deeper):
I made that "localhost" when using Http class is translated to "127.0.0.1", so one still can contine using Http class as before, no need to modify. However, if someone would like to use arbitrary .NET API for RESTing - now you know that "localhost" DNS adds around 1 second.
@per-samuelsson can you please pull and confirm that its also fixed on your machine? Thank you!
@alemoi wow, amazing for me is the way you discovered it in real life :8ball: Frankly speaking, I think we've already discussed such phenomena a couple years ago in a completely different context, and it still strikes back. There are numerous posts on the Internet on how to "fix" it on Windows, e.g. http://cubicspot.blogspot.se/2010/07/fixing-slow-apache-on-localhost-under.html. Thus we should recommend to apply such fixes to our clients, or even include hosts optional patch in installer.
I sure will. Great work finding that, and fixing it.
I can confirm @alemoi's fixes sliced of basically all time from those 3 spots I reported earlier. Well done, @alemoi! I will post some numbers later.
Right now, I'm working on trying to find whatever new spots that are interesting in managed code, so that we try shave those of before we go to the real beast, connecting the database, where we'll need to do unmanaged profiling to get somewhere.
I have at least one new spot that I'm currently analysing that seem interesting, and when I have more info on it, I'll update here with more general information about where we are right now.
OK, so first a sample of those three spots @alemoi addressed:
Starcounter.Host: Scheduler configured.: 10.07.56.408
Starcounter.Host: Apps bootstraper initialized.: 10.07.56.552
Starcounter.Host: App domain configured.: 10.08.00.394
Starcounter.Host: Server command handlers registered.: 10.08.00.411
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 10.08.00.829
Starcounter.Host.Loader: Internal handlers were registered.: 10.08.00.850
With that, they are now well under the radar, and are clearly acceptable.
I've made numerous samples now, doing a few tweaks too, but things are now more or less status quo at showing an overall time of 8,5 - 10 seconds, normally around 9-ish. That, again, include:
Compiling source on the fly (done by star.exe) (~1 second)
Starting scdata (-)
Starting scdblog (-)
Starting sccode (~5-6 seconds)
Weaving (~1 second)
Bootstrapping application in host (~1 second)
Within booting up the code host, we now have these weak spots:
Starcounter.Host: Database configured.: 10.07.56.559
Starcounter.Host: Database connected.: 10.08.00.347
Starcounter.Host.Loader: Query module schema information updated.: 10.08.00.513
Starcounter.Host.Loader: System type specifications initialized.: 10.08.00.578
Starcounter.Host.Loader: CLR view meta-data were populated for the given types.: 10.08.00.829
All other micro-tasks withing the code host bootstrap is now showing numbers where I'm not considering them interesting to go further with. Possibly, that last 0,3 second triplet is not very interesting either. I just wanted to show the one that was "second worse".
One new interesting thing I found was that it's always ~ 1 second between the last entry written by the code host bootstraping sequence, and when the admin server end that task (the task is "waiting for code host to be ready". After some initial diagnosing of that, I've now isolated it down to one thing added rather recently. It's an HTTP call from within the admin server that fires whenever the internal state of that change, and I remember I've asked @andwah about it earlier, and that I had some discussion with @andwah and @alemoi about it, and if it could be done differently. I think what it does is some kind of pushing out a "state update" to clients (like web admin). If I understand it correctly, that call is synchronous, and hence the admin server can't report back to star.exe that bootstraping of the database is done before that call complete:
I think thats the best shot we got to shave of another second right now, since it seems like the most low-hanging fruit. After that we only have database connecting left until we've reached the goal that weaver should be the bottleneck.
@per-samuelsson already an achievement, thanks for an update! Yes I'm very curious :) What do you think would be the best approach to drill down into starting sccode 5 secs?
What do you think would be the best approach to drill down into starting sccode 5 secs?
Well, it's hard to say, because now we are getting closer to have only that Database connected thing left, and it takes like > 75% of the overall time. I'm not updated on what is done in there (I know recovery runs, but I've seen that doesn't make a big impact). And also, because of that, I'm having even harder to understand what things can be optimized in there, and those that can not.
My idea is that once I get a better idea of this more low-hanging thingy I described above, it's time to take on that database connect.
I've now ruled out that it has anything to do with "plumbing", or the call context itself. I've tested to replace it with an explicit, precreated Node instance, and changed to "127.0.0.1" when creating that, and to no prevail. It still is very close to ~ 1 second every time, so it seem to be something with the implementation.
I've now ruled out that it has anything to do with "plumbing", or the call context itself.
Hmm, seems I have to take a step back from that already. Now I have a sample where it's ~ 1 second from the time the PUT is called, and until the handler is reached. It seem to be some kind of cold starting thingy though, since consecutive calls perform in just milliseconds.
Now I have a sample where it's ~ 1 second from the time the PUT is called, and until the handler is reached. It seem to be some kind of cold starting thingy though, since consecutive calls perform in just milliseconds.
More info:
If I precreate the node (in some other thread probably), and keep a reference to it, I can get all calls down to milliseconds, except the first one, that always clock in at ~ 1 second.
// Precreate
node = new Node("127.0.0.1", 8181);
// Use later in another thread
Trace("Updated", watch);
node.PUT("/__internal_api/databases/" + database.Name, string.Empty, null);
Trace("After PUT", watch);
Trace("Updated2", watch);
node.PUT("/__internal_api/databases/" + database.Name, string.Empty, null);
Trace("After PUT2", watch);
Starcounter.Server: Entering update: 04.07.46.289
Starcounter.Server: Updated: 04.07.46.295
ServerManager: Enter PUT: 04.07.47.219 // First line in the actual handler, a second later
ServerManager: Got database: 04.07.47.221
ServerManager: Invalidated model: 04.07.47.224
Starcounter.Server: After PUT: 04.07.47.227
Starcounter.Server: Updated2: 04.07.47.230
ServerManager: Enter PUT: 04.07.47.233 // First line in the actual handler, now just 3 ms
ServerManager: Got database: 04.07.47.235
ServerManager: Invalidated model: 04.07.47.239
Starcounter.Server: After PUT2: 04.07.47.242
By installing an unmanaged profiler and doing mixed-mode debugging, I've finally been able to produce some results (let's just hope they are accurate).
As stated earlier, like > 80% of current bootstraping cycle happens when booting up an empty code host and specifically when invoking Bluestar sccoredb.sccoredb_connect. I've profiled that a couple of times now, and the results I get are pretty consistent. Here are two extracts:
The setup is a locally built Bluestar debug configuration, so absolute timings should be considered in that context, but maybe the relative numbers could give us a hint on where to look further.
I'll try to build a local release version of the kernel too, to see what that gives. And see if I can dig deeper into the tree to be even more specific. I'll get back on that.
Big thanks to @chrhol that helped me setup a build environment for Bluestar, baring with me when I yanked about everything being so difficult and complicated. And @dan31 that suggested VS Enterprise profiler, which worked out great.
I had a feeling the above might be false diagnostics, and I think they might be. So if someone's reading this, give me a moment to test with a sharp Bluestar version first.
Testing on level0 also gives me an impression the time is spent inside db connect.
Thanks for sharing that. I also still think it's there, but just not spent like my spreadsheets showed, since it seem just not likely that __get_image_header_read_ptr could take up so much of that time. It just doesn't make sense.
[...] since it seem just not likely that __get_image_header_read_ptr could take up so much of that time. It just doesn't make sense.
Seem the feeling I had might was on to something. Doing profiling on a locally built release version of Bluestar revealed something completely different:
And thats more in line what I suspected from the beginning, that some kind of compiling / code generation must be in play here. Which is kind of sad, b/c that's where it will hurt us the most I guess, not being in control over that code.
Why did LLVM not even surface in debug version profiling?
What's up with that 1 second it take every time for the first Self/Node/Http call (described earlier). I'll try to profile that too, while I'm already at it.
About LLVM, I have no clue what options we have to do anything. In my latest tests (using about time booting script), the isolated time to run through code host only bootstrapping (i.e. from beginning of Main until we are done processing initialization) is ~ 3,5 seconds in an optimized build. Out of that, above database connect is like > 90%, so other optimizations I just can't find in that sequence right now.
Why did LLVM not even surface in debug version profiling?
Maybe because its an external call to a DLL?
What's up with that 1 second it take every time for the first Self/Node/Http call (described earlier). I'll try to profile that too, while I'm already at it.
Hmm, yes, indeed interesting. I can check this once have time.
Well, why would it show up in release in such case? It's just different builds.
Hmm, yes, indeed interesting. I can check this once have time.
It's superfrustrating actually, because I can constantly show that 1 second penalty by emitting before/after calls to the diagnostics text file. But as soon as I attach debugger/profiler, the call take "no time". :game_die: :anguished:
It's superfrustrating actually, because I can constantly show that 1 second penalty by emitting before/after calls to the diagnostics text file. But as soon as I attach debugger/profiler, the call take "no time". :game_die: :anguished:
Maybe the delay takes place in gateway? And the reason for this is again LLVM codegen there? @per-samuelsson you can leave this one for me. I'm pretty sure its code generation delay in gateway.
@per-samuelsson you can leave this one for me. I'm pretty sure its code generation delay in gateway.
Really? OK, sure. I'm starting to get a feeling there is little I can do here once these analysis have been completed anyway, so if you think I should pause for now, I'll do so. Let me know if I can help you investigate in any way further!
Also, would be nice to also find out the catastrophic absolute numbers reported by @ivarbergman earlier in this thread and by @atifwaqar in #3494. They just seem really really off. Maybe I can do something there to help those guys better profile...
@per-samuelsson you can leave this one for me. I'm pretty sure its code generation delay in gateway.
@alemoi, if what you guess will turn out to be a case, then we are bounced into a big need of making a more controlled compilation logic.
Here is a simple version of it. Namely:
1) When developers are in dev env, the LLVM compiler is set to fast compilation without strong optimisation. Result is fast dev cycle. Devs get their apps to run in a blink of an eye.
2) When developers put app in prod env, they turn a switch in DB conf file to let full-blown strongly optimised compilation for their code. Not suitable for dev cycle on dev machine, but need to switch on for load testing / prod use.
Actually, points 1 and 2 are no more than this:
In that sense, we are no special. Developers don't wait for a release version to be built every time they make a change, since it is expensive. Well, I'm not in favour of such manual switching, in my mind this is too 2000-sh, cumbersome (especially for our target audience willing to do super-fast and super-simple development), and there is a better approach.
Let me give you an analogy. For instance, in JVM, when you compile and run your code first time, it isn't yet strongly optimised (hence starts relatively fast). However, it gets better optimised over execution. The VM finds hot spots on the execution pathes and rebuilds them at moment while they aren't in use. That is why its VM is called HotSpot. Somewhat similar happens in modern SQL processing engines. They do not attempt to globally optimise the query when the user runs it first time (to avoid letting them wait a way too long), instead they optimise an execution plan during the actual execution of the query.
In our case, the state of affairs is even simpler. We might build compiled representations (to whatever - queries, URIs and so on) without strong optimisations at first place. Then, already when the app is started, we can start recompiling the representations with strongest possible optimisations. Think of it as two-level cache. While strongly optimised functions aren't yet in cache, the weakly-optimised are invoked. On the availability of strongly optimised representations (ostensibly which will actually take these seconds it takes them to compile now) the references in cache are (thread-safely) replaced to those. Hence you have no overhead of switching to them (simply you had one pointer, now you have another).
Thus, while now we (ostensibly - it needs to be verified) force-feed the user for seconds with longer compilation any time he starts an app, after the suggested addition this time will move to (a bit longer) warm-up phase, not noticable by an applied user in 99.9% of cases.
Also, would be nice to also find out the catastrophic absolute numbers reported by @ivarbergman earlier in this thread and by @atifwaqar in #3494. They just seem really really off. Maybe I can do something there to help those guys better profile...
@per-samuelsson That is a good point. @atifwaqar, maybe you can spend some time to get us the startup issue reproduced? What do you think?
@atifwaqar Would be best if you can prepack your whole setup (use 7z Ultra compression, include all Starcounter folder from Documents, all apps that you start and a batch file you use to start them) so that @per-samuelsson (or anyone like @alemoi) can reproduce the run you do to observe slow start. Henceforth we can use this laboratory as a base to measure impact of improvements.
@dan31 Sure, the Starcounter folder from Documents is 30+ GB. I am copying it to our SC Basement server, so your team can easily access it. I will update you as soon as its copied.
We might build compiled representations (to whatever - queries, URIs and so on) without strong optimisations at first place.
I really like the ideas you present in that post. It gives us best-of-all: fast startup times, yet fast execution of critical code. Let's just hope it hold up, so it's not super-big or challenging in terms of implementation time.
Big thanks to @chrhol that helped me setup a build environment for Bluestar, baring with me when I yanked about everything being so difficult and complicated. And @dan31 that suggested VS Enterprise profiler, which worked out great.
There is an interesting alternative, tried it out today: https://github.com/GPUOpen-Tools/CodeXL/releases (ex-commercial AMD Profiler). It comes for free now and nicely integrates in VS. Yes it is optimised for AMD hardware, but it has nicely done sampling analysis (independent on CPU). Apart from all standard sampling data for native code, it also gives you assembly decomposition for each codeline, which is sometimes the most valuable part in the analysis as you can drill down deep:
It gives us best-of-all: fast startup times, yet fast execution of critical code. Let's just hope it hold up, so it's not super-big or challenging in terms of implementation time.
Important to note, if @alemoi confirms what we think is a case, then this work is an absolute need to any future versions (not just a patch for existing stuff), since the dominance of llvm time will still prevail unless it is solved in smart caching manner.
Just my 2 cents. If the below is already done, please disregard.
Making the LLVM faster?
Before implementing the cache, I suggest we try the following:
When doing the JITting, turn of all optimizations.
Never use linking. This is probably how it works already (this is how the URI matcher is currently implemented).
If you need to call Bluestar functions, this should be done by writing the pointers to the functions from the host (again, _no_ linking using a linker). Always use CLang/LLVM inmemory.
Never load header files (or touch file file system). The C source should be in a single Clang input buffer and should only contain the function declarations and code actually used. This is probably how it works already (this is how the URI matcher is currently implemented).
@Starcounter-Jack very good cents! Not currently sure of all them right now (probably missed something obvious while making llvm in-memory from disk-generated), but @alemoi will consider your notes top prio when he comes back on Tuesday. Yes, currently (since some time) stronger optimisations are on to produce faster functions. On Friday @alemoi tested only with optim flag set to false, resulting in 3.5 faster generations. We don't know yet of the balance gen time / run time. I believe during next week we are going to have all info at hand and most likely an issue being resolved.
@Starcounter-Jack No file system is involved, llvm is completely in-memory for all its users. Most likely the reason is optimisations, and maybe some details on the full jit chain.
I've shortly ran .NET profiler in-vivo on SWO. Their suite takes 30 sec. to "Starting a database...". Which is, as we've noticed already, isn't about database. The proper message there should be "Starting Starcounter..." but not "Starting a database...", in my opinion. First I noticed that half of a CPU time over these 30 secs. took scnetworkgateway. Next time I restarted a suite I promptly attached the profiler to it. Here are the results:
Going deeper, it is clearly this:
My next candidate was sccode.exe, which also showed some peak on CPU usage approximately in the middle of the process, competing with the gateway for cpu resource. Here are the results of its profiling at a startup sequence:
Needless to continue investigating, I believe.
I think the severity of this unfortunate imprudence of llvm module usage is now obvious. Hence I safely assign the issue to @alemoi and raise its priority to high. I realise it isn't that simple to fix, since the reason of it is clearly that you wanted to make llvm generate more performant results, which resulted into an issue which users don't buy in. I believe the best quick fix, which will prevent our current users from having such big delays in dev cycle and suffering from it, is to introduce a database setting in conf file with default "no optimisation", as @Starcounter-Jack suggests. Putting their stuff in production, they can swtich "stronger optimisations". The proper longer-term fix is to do a smart cache which gives unopt versions at first place and then humbly builds optimised versions transparently to the user. We need to make our users be comfortable with the technology, and enforcing them to wait for minutes in order to get runtime boost just every time isn't an option (don't know how much you'd gain, but most of them ain't doing high-frequency trading or another Facebook).
@per-samuelsson thank you for your deliberate examination, I believe the root of the problem is now clear enough. @per-samuelsson you could repeat the same procedure with @atifwaqar data file to observe which takes that long (perhaps exactly same thing).
I believe the best quick fix, which will prevent our current users from having such big delays in dev cycle and suffering from it, is to introduce a database setting in conf file with default "no optimisation", as @Starcounter-Jack suggests. Putting their stuff in production, they can swtich "stronger optimisations".
How bad is the performance of Starcounter if the optimisation is off?
This quick fix might have drawback for new users, who download Starcounter and test performance. I doubt they will do proper configuration. They will probably run Starcounter as it is. Is it something we should worry about?
May be the switch by default should be on and then we teach our customers to switch it off in the development environment? Note that it is easy to forget to fix configuration in production environment.
How bad is the performance of Starcounter if the optimisation is off?
Thank you @k-rus, It is very important point to track. I call us up to be very concise in such changes (and @alemoi in particular). The change itself is just a flag, but we need to clearly get here two sets of numbers as a table summary:
1) What is the decrease in startup time? In which examples and tests? (Empty db? TPC-C and which settings? RetailDemo? Customer case?)
2) What is the increase in runtime time? In which examples and tests? (Empty db? TPC-C and which settings? RetailDemo? Customer case?)
This quick fix might have drawback for new users, who download Starcounter and test performance. I doubt they will do proper configuration. They will probably run Starcounter as it is. Is it something we should worry about?
I would say like this. Substantially less than people downloading Starcounter and waiting for 6-10 secs at a first app start, not mentioning current customers struggling from minutes of startup. I also have a gut feeling that the decrease in perf won't be times, but probably 10-s of percents. For our published benchmarks, we turn on all perf tunings. Once smart cache is in place, this isn't a concern any more.
Naturally, when newbies do their ad-hoc microbenchmarks, they look at these steps:
1) Ease of setup and first coding experience (put a tick here, "Hello world" and transacted apps are very straightforward today, in well-recognised environments like Visual Studio);
2) Ease of dev cycle (coding-running-stopping-changing-...): needs improvement for startup time (current issue) and stopping/restarting time (addressed in Core .NET track). People don't like to wait, if they wait and the platform looks as if it is hangs big time, they would incline to leave the platform. There are more smaller points to address: e.g. precreate first database at an installation in the async way, but not when the user starts his first app.
3) Perf of writes. They populate the database with loads of data, like inserting millions of arbitrary generated rows like people and houses. People don't like to wait, if they need to wait for minutes to load their test data they would incline to leave the platform. Today you load 1.000.000 people of FirstName LastName in around 1.5 secs. on my laptop on a battery, surely can be improved but decently okay already (we as a company take great care about write performance levels).
4) Perf of reads. When the data is there, they want to crunch it. We all know what we want to solve there and how, I'm not diving into detail. Once NewSQL is in-place and redundant .net onion flakes around getting data to the codehost is removed, the reads would improve. Once we add OLAP support, the reads would amazingly improve.
5) Perf of mixed loads. Only quite savvy users and real customers approach this stage. We talk about concurrent access to the database and operational analytics example, when you want to do writes OLTP and analyse your current data in the OLAP way simultaneously. This is important when Starcounter gets really big public traction and people start to really stress-test it with amusing workloads.
Effectively, the point we address here is N.2 in the list. If N.2 isn't addressed, the amount of newbies trying out Starcounter but reaching N.3 and N.4 will greatly decrease.
May be the switch by default should be on and then we teach our customers to switch it off in the development environment? Note that it is easy to forget to fix configuration in production environment.
For the reasons I gave above, I think the most reasonable way to approach it is to provide the off flag by default and do work on smart cache asap and deprecate a flag. Our goal for Core .NET version is to achieve sub-second dev cycle for both start and stop, whilst it will be the same scllvm.dll involved into a loop. Hence, if kept as is and flag is on, it will never be possible to achieve sub-second experience.
However, it might turn out that the root reason isn't optimisation itself, but, for instance, heavy heap trashing or dealloc times per each function jitted when invoking llvm.dll. We don't know yet, @alemoi is for a rescue in the issue.
Actually this is only when there is a performance-related evaluation. The other big part of evaluation is the app dev framework, viewmodels with thin clients and all related to app development. When they try and test these (maybe it will be 70-80% of initial evaluation or even more, and this version we use in starting tutorials), compiling high-performant version is quite useless. However they might cope it with data generation and fetching (like trying to create analytical application with GUI), but this still goes before p. on writes and reads - they first create an app with no data, but with some GUI and events responses like button clicks or alike.
@k-rus @alemoi Another point, looking at a cap of 20% cpu load on a profiling session for both sccode and scnetworkgateway I conclude that llvm gen is run on a single thread. Maybe running this in 4 threads or alike might improve it 1.5-2 times. However, this is still to be 10 secs instead of 20 secs.
I used the same great timeboot from @per-samuelsson which measures warm start.
So as we can see when it comes to URI matcher generator, the difference is about 3 times, which is good. However, the database configuration still takes 1.5 seconds! Instead of 2.2. Still a lot for just a database configuration step. I guess, we again have to dig deeper in database configuration native code.
@dan31 regarding your research, most probably its the CPU time that your tool measured, not the actual calendar time. LLVM eats CPU well because its a linear algorithmic code without syncronization and so on, that's why its shown to eat a lot of CPU.
I will now run TPCC without optimizations to see the difference in performance degradation.
@alemoi, one question, which app start do you actually measure? In my case this is a set of real apps, with lots of real URIs and queries. Regarding the sampling data, I was assuring visually that waiting was actually correlated to CPU work indicated on a diagram. Namely, these 20 seconds were not just something, but the CPU work capped at 25%, the work being concentrated in scllvm. I might be wrong, though I currently don't see any other particular source of slowness: I see CPU working while database is starting. Ha, actually, @alemoi, if you just send me a pdb file, I would gladly dig deeper on that matter. I can still tease these production machines. Can you do that?
I need to build a new version of scllvm, pulling the sources now. Had to clean the hard-drive of not enough space. Will try to produce a new scllvm.dll/pdb tonight.
so its seems to be an issue with LLVM. My guess there simply a difference in time for opt/nopt generation for different types of code: URI matcher and SQL code. That's why we don't see 3 times less time in case of SQL code.
@alemoi that is also an interesting thing to try out. But really, point me to the pdbs, I'm eager to push it further on on these machines.
I have recovered my llvm building environment, with their latest version 3.8.0. Of course, they have changed some interfaces and libraries :) I adopted the code and tested its basic work. Uploaded new scllvm.dll and pdb to \\scbuildserver\ftp\temp\am\scllvm\
HI! I just want to check with you if there are any progress (or new findings) on this issue? We are still suffering from slow database & application startup. About 30 sec for each app.
Some of our servers have 10 database with 15 apps each. Which means that restarting the services is a 1+ hour process.
We use Azure A2 instances and using starcounter version Starcounter-DevelopNightlyBuilds-2.2.1.3137 (Sep 13, 2016 2:43:12 AM)
To inform, in develop there is now a new working scllvm caching schema which should improve startup time. Hopefully soon it will be introduced in pnext as well.
@ivarbergman could you please provide your typical start/re-start script for you set of apps? This is currently not related to compile times (those are actually shaved off with new llvm subsystem). Using your case, we need to investigate it further.
Hi! We have had input from @dan31 about how to do restart apps outside of this issue. (I.e. stop the sc host before we restart apps.) Thanks!
But we stil have rather slow start up even if we do this.
Locally (on my dev machine) I noticed that the Temp folder are very large. Temp/weaver 8GB and Temp/sqlschemas 5GB and when I created a new database (reusing the sci-files) the Temp folders are much smaller and startup times was much better.
Question: Do you think that large Temp folder can result in slower startup? And can the Temp folder be emptied safely?
But we stil have rather slow start up even if we do this.
Sorry to hear that. Let's see if we can have some brainstorming session and try understand why.
Question: Do you think that large Temp folder can result in slower startup?
I don't think it should, no.
And can the Temp folder be emptied safely?
In theory, it should be safe. But I have a vague memory where I once dropped some folders in there and it messed something up. There are issues in our backlog that describe stuff we should do to better keep them under control. Let me get back on that.
FYI: we are still running SC 2.2 but we will soon switch to 2.3. So maybe we shouldn't do too much investigations until we have made the upgrade.
I see. Thanks for sharing.
Should be pretty easy to test if its safe to delete content under Temp at least. Create a new database, start desired apps in that. Then stop the database, delete everything under Temp (don't delete the Temp\[Database] folder in itself) and restart the apps. If that works, it's safe.
Something like this:
// 1. Create database
staradmin -d=foo new db
// 2. Start app suite
// For each app you want to test:
star -d=foo app.exe
// Then when all are started
staradmin stop db
// Go delete everything under Temp\Foo
// The rerun step 2. If apps start correctly,
// its a go.
@per-samuelsson commented on Fri Apr 08 2016
We are spending some time trying to improve on the cycle, and I'm finding that starting a database even in the simplest scenario is really slow.
That last step takes approx. 15 seconds (!!) on my machine.
And it does not improve if I do it a second time:
I'll try polishing of old "log steps" to see if I can find where all that time is spent.
If anyone can post their experience doing the same thing (I basically just counted seconds after hitting
staradmin ....
), that's appreciated.@alemoi commented on Fri Apr 08 2016
Great! I created an issue some time ago about super slowness on sctestsrv03 machine. Around 30 seconds there.
@per-samuelsson commented on Fri Apr 08 2016
Thanks, @alemoi. That's #3442, and let's try to address these two as one. My initial tracing seem to align quite well with those of that issue.
@ivarbergman commented on Fri Apr 08 2016
Around 20 sec on our production server (Azure A2) . approx 10 sec on my local machine.
@per-samuelsson commented on Fri Apr 08 2016
Thanks @ivarbergman. We should really try to improve those numbers.
Here's a script that could be used:
timeboot.bat
Here's a sample run from me, on a locally built version:
This extract seem to be the most interesting:
@per-samuelsson commented on Fri Apr 08 2016
I refined the script from above further, so we also produce a diagnostic time log, since traces routed to the log seem not always to turn up.
timeboot.bat
And an extract from a
timeboot.txt
, somewhat prettyfied and with a few comments:@per-samuelsson commented on Fri Apr 08 2016
Since the majority of time spent is consumed when booting up an empty host, I'll focus on that initially, trying to find answers. Here's spots I've filtered out from that sample, all consuming more than 1 second, and sorted by their relative impact:
Note that absolute numbers shouldn't be trusted here, since I'm running a personal build that might contain some debug code. The focus is on relative numbers.
@per-samuelsson commented on Fri Apr 15 2016
Doing some closer measurements didn't reveal anything new in the first iteration. Just as everything suggest above, the host "connecting" to the database is what steal almost all of the cycles.
And that method is nothing more than this:
I've recompiled experimenting with the flags too, but the impact is close to zero. So I guess the next step for me must be to start profiling Bluestar.
@dan31 commented on Fri Apr 15 2016
I CC @eriohl and @alemoi there to ensure they receive updates on that one (they've recently looked into starting a db standalone and no slowdown was noticeable, hence I believe they might add valuable insights for the current story).
@per-samuelsson commented on Fri Apr 15 2016
Thats kind of weird. I'm doing this on a local build, and one that bring down the Bluestar binaries from our build server, but maybe it's not a sharp version or something like that? I do remember that in the past, there was a huge difference between the Debug and Release builds in booting up (the former doing a bunch of checks on the page level or something like that).
On the other side, we've now seen reports about really slow boots from customers too, and they'll for sure have release builds. But then again, it might be they don't suffer from the exact same pain I'm suffering from.
I think I'll install a sharp version here and run it on that to make sure. Stay tuned. :smile:
@alemoi commented on Fri Apr 15 2016
I think that might be code generation for indexes that is very slow. It happened when I moved codegen to be in memory using Clang. Might that be true @eriohl?
@per-samuelsson commented on Fri Apr 15 2016
When downloading and installing, numbers improved significantly on my environment, so that indicate that doing a local build of Level1 will bring down some not-optimized Bluestar binaries, which probably is the right thing to do.
WTBS, I still time around 10-15 seconds for booting up just an empty application. I'll go through my diagnostic logs and try to make a selection like the one above, but with this new data.
@eriohl commented on Fri Apr 15 2016
I think it is a lot of stuff that is very slow but this certainly hasn't helped any. But it seems not to be all that relevant since there is no real alternative here. And I note that we starting the host without the managed host it goes a lot faster and it also does the code generation (for indexes at least). So perhaps not the biggest culprit.
@per-samuelsson commented on Fri Apr 15 2016
So here's a new extract, where I've cut off some edges that seem not very interesting. In this sample, the bootstrapping takes ~11 seconds, and include:
star.exe
) (~1 second)scdata
(-)scdblog
(-)sccode
(~8 seconds)In the past, we've said that we can't be satisfied with the bootstrapping sequence until the weaver consumes up the majority of time of it, and I'm not sure that apply 100% in this sample, since the application is super-limited. But even if it's just like truth-ish, the ratios are pretty off as can be seen.
Here's the extract of the code host bootstrapper:
@alemoi commented on Fri Apr 15 2016
That might help: https://github.com/Starcounter/RebelsLounge/blob/master/buildsystem.md#building-level0-as-part-of-level1
@per-samuelsson commented on Fri Apr 15 2016
Ah, I knew about that of course, but I had missed this:
Thanks for pointing that out! And like I said, I think the current feature is correctly designed, so no need to change anything.
@dan31 commented on Fri Apr 15 2016
So the biggest breaches appear to be:
Starcounter.Host: Database configured.: 11.31.02.837
Starcounter.Host: Database connected.: 11.31.05.809
(3 sec) andStarcounter.Host: App domain configured.: 11.31.05.844
Starcounter.Host: Server command handlers registered.: 11.31.06.890
(1 sec), the rest is quite equally distributed into small portions of time (4-14 msecs mostly).@alemoi commented on Fri Apr 15 2016
What about this one?
@dan31 commented on Fri Apr 15 2016
@alemoi yes also quite a big one, totalling in 5 secs in 3 breaches.
@alemoi commented on Fri Apr 15 2016
And this one which is on me i guess:
@per-samuelsson commented on Fri Apr 15 2016
Yeah, exactly, that's my conclusions as well.
One thing worth noticing about those is that this
and this
seem to relate to registration of handlers.
I'll review what happens when the "apps bootstrapper" initialize, to see if also include that. Then we might get 3 for the price of 1 (if they are "fixable").
@alemoi commented on Fri Apr 15 2016
I am adding more log entries to have more precision.
@per-samuelsson commented on Fri Apr 15 2016
OK cool, I was about to do that, but I'll happily let you. :smile:
Yeah, seem quite a lot of handle registration is going on there too, so it's likely the underlying barrier. But I guess that involves some kind of code generation, so maybe it's not much we can do about it.
@alemoi commented on Fri Apr 15 2016
I just pushed 2 more, you can now add yours :)
@alemoi commented on Fri Apr 15 2016
I think its might be the handlers registration with gateway that takes time. So I will have to look into it, if I can introduce some optimizations.
@alemoi commented on Fri Apr 15 2016
I added a new measurement to get average simplest handler registration time which is around 0.7 ms(!) (after some optimizations though :smiley_cat:) on my machine, so it should not be the problem (added a test so now we measure this:
SimplestHandlerRegistrationMs
). However, digging deeper, since I think something is still wrong with handlers registration or some magic there!@alemoi commented on Sat Apr 16 2016
Just finished investigating. Looked everywhere. You would not believe what was that :smiley_cat: It was "localhost" name resolution inside codehost to communicate REST with gateway. When using IP "127.0.0.1" - it solves all the above problems except this one (which we should dig deeper):
I made that "localhost" when using Http class is translated to "127.0.0.1", so one still can contine using Http class as before, no need to modify. However, if someone would like to use arbitrary .NET API for RESTing - now you know that "localhost" DNS adds around 1 second.
@per-samuelsson can you please pull and confirm that its also fixed on your machine? Thank you!
@dan31 commented on Sat Apr 16 2016
@alemoi wow, amazing for me is the way you discovered it in real life :8ball: Frankly speaking, I think we've already discussed such phenomena a couple years ago in a completely different context, and it still strikes back. There are numerous posts on the Internet on how to "fix" it on Windows, e.g. http://cubicspot.blogspot.se/2010/07/fixing-slow-apache-on-localhost-under.html. Thus we should recommend to apply such fixes to our clients, or even include
hosts
optional patch in installer.@per-samuelsson commented on Mon Apr 18 2016
I sure will. Great work finding that, and fixing it.
@k-rus commented on Mon Apr 18 2016
I am wonder: is it once per application start up or was it for each handler?
@per-samuelsson commented on Mon Apr 18 2016
I can confirm @alemoi's fixes sliced of basically all time from those 3 spots I reported earlier. Well done, @alemoi! I will post some numbers later.
Right now, I'm working on trying to find whatever new spots that are interesting in managed code, so that we try shave those of before we go to the real beast, connecting the database, where we'll need to do unmanaged profiling to get somewhere.
I have at least one new spot that I'm currently analysing that seem interesting, and when I have more info on it, I'll update here with more general information about where we are right now.
@dan31 commented on Mon Apr 18 2016
@per-samuelsson, how much in now shave off in total, is it like 2x faster? (probably -3 secs)
@per-samuelsson commented on Mon Apr 18 2016
I asked for a little more time to properly process the raw data and come with an update, but your curiosity just can't wait, can it? :smile:
Give me 10 minutes, and I'll post some new reflections.
@per-samuelsson commented on Mon Apr 18 2016
OK, so first a sample of those three spots @alemoi addressed:
With that, they are now well under the radar, and are clearly acceptable.
I've made numerous samples now, doing a few tweaks too, but things are now more or less status quo at showing an overall time of 8,5 - 10 seconds, normally around 9-ish. That, again, include:
star.exe
) (~1 second)scdata
(-)scdblog
(-)sccode
(~5-6 seconds)Within booting up the code host, we now have these weak spots:
All other micro-tasks withing the code host bootstrap is now showing numbers where I'm not considering them interesting to go further with. Possibly, that last 0,3 second triplet is not very interesting either. I just wanted to show the one that was "second worse".
One new interesting thing I found was that it's always ~ 1 second between the last entry written by the code host bootstraping sequence, and when the admin server end that task (the task is "waiting for code host to be ready". After some initial diagnosing of that, I've now isolated it down to one thing added rather recently. It's an HTTP call from within the admin server that fires whenever the internal state of that change, and I remember I've asked @andwah about it earlier, and that I had some discussion with @andwah and @alemoi about it, and if it could be done differently. I think what it does is some kind of pushing out a "state update" to clients (like web admin). If I understand it correctly, that call is synchronous, and hence the admin server can't report back to
star.exe
that bootstraping of the database is done before that call complete:I think thats the best shot we got to shave of another second right now, since it seems like the most low-hanging fruit. After that we only have database connecting left until we've reached the goal that weaver should be the bottleneck.
@dan31 commented on Mon Apr 18 2016
@per-samuelsson already an achievement, thanks for an update! Yes I'm very curious :) What do you think would be the best approach to drill down into starting
sccode
5 secs?@per-samuelsson commented on Mon Apr 18 2016
Well, it's hard to say, because now we are getting closer to have only that
Database connected
thing left, and it takes like > 75% of the overall time. I'm not updated on what is done in there (I know recovery runs, but I've seen that doesn't make a big impact). And also, because of that, I'm having even harder to understand what things can be optimized in there, and those that can not.My idea is that once I get a better idea of this more low-hanging thingy I described above, it's time to take on that database connect.
@per-samuelsson commented on Mon Apr 18 2016
An update on this call:
I've now ruled out that it has anything to do with "plumbing", or the call context itself. I've tested to replace it with an explicit, precreated
Node
instance, and changed to "127.0.0.1" when creating that, and to no prevail. It still is very close to ~ 1 second every time, so it seem to be something with the implementation.@per-samuelsson commented on Mon Apr 18 2016
Hmm, seems I have to take a step back from that already. Now I have a sample where it's ~ 1 second from the time the PUT is called, and until the handler is reached. It seem to be some kind of cold starting thingy though, since consecutive calls perform in just milliseconds.
@per-samuelsson commented on Mon Apr 18 2016
More info:
If I precreate the node (in some other thread probably), and keep a reference to it, I can get all calls down to milliseconds, except the first one, that always clock in at ~ 1 second.
@per-samuelsson commented on Wed May 04 2016
Took a new iteration on this today, taking off from previous findinds, and this time with the aim to catch what's up with:
By installing an unmanaged profiler and doing mixed-mode debugging, I've finally been able to produce some results (let's just hope they are accurate).
As stated earlier, like > 80% of current bootstraping cycle happens when booting up an empty code host and specifically when invoking Bluestar
sccoredb.sccoredb_connect
. I've profiled that a couple of times now, and the results I get are pretty consistent. Here are two extracts:The setup is a locally built Bluestar debug configuration, so absolute timings should be considered in that context, but maybe the relative numbers could give us a hint on where to look further.
I'll try to build a local release version of the kernel too, to see what that gives. And see if I can dig deeper into the tree to be even more specific. I'll get back on that.
Big thanks to @chrhol that helped me setup a build environment for Bluestar, baring with me when I yanked about everything being so difficult and complicated. And @dan31 that suggested VS Enterprise profiler, which worked out great.
@per-samuelsson commented on Wed May 04 2016
I had a feeling the above might be false diagnostics, and I think they might be. So if someone's reading this, give me a moment to test with a sharp Bluestar version first.
@alexanderstc commented on Wed May 04 2016
Testing on level0 also gives me an impression the time is spent inside db connect. My 2 rusty öre.
@per-samuelsson commented on Wed May 04 2016
Thanks for sharing that. I also still think it's there, but just not spent like my spreadsheets showed, since it seem just not likely that
__get_image_header_read_ptr
could take up so much of that time. It just doesn't make sense.@alemoi commented on Wed May 04 2016
I think its a code generation and linking using LLVM is what taking time. Just a pure guess.
@per-samuelsson commented on Wed May 04 2016
Seem the feeling I had might was on to something. Doing profiling on a locally built release version of Bluestar revealed something completely different:
And thats more in line what I suspected from the beginning, that some kind of compiling / code generation must be in play here. Which is kind of sad, b/c that's where it will hurt us the most I guess, not being in control over that code.
@per-samuelsson commented on Wed May 04 2016
We think alike. :smile:
@per-samuelsson commented on Wed May 04 2016
A few things now are occupying me:
About LLVM, I have no clue what options we have to do anything. In my latest tests (using about time booting script), the isolated time to run through code host only bootstrapping (i.e. from beginning of
Main
until we are done processing initialization) is ~ 3,5 seconds in an optimized build. Out of that, above database connect is like > 90%, so other optimizations I just can't find in that sequence right now.@alemoi commented on Wed May 04 2016
Maybe because its an external call to a DLL?
Hmm, yes, indeed interesting. I can check this once have time.
@per-samuelsson commented on Wed May 04 2016
Well, why would it show up in release in such case? It's just different builds.
It's superfrustrating actually, because I can constantly show that 1 second penalty by emitting before/after calls to the diagnostics text file. But as soon as I attach debugger/profiler, the call take "no time". :game_die: :anguished:
@alemoi commented on Wed May 04 2016
Maybe the delay takes place in gateway? And the reason for this is again LLVM codegen there? @per-samuelsson you can leave this one for me. I'm pretty sure its code generation delay in gateway.
@per-samuelsson commented on Wed May 04 2016
Really? OK, sure. I'm starting to get a feeling there is little I can do here once these analysis have been completed anyway, so if you think I should pause for now, I'll do so. Let me know if I can help you investigate in any way further!
Also, would be nice to also find out the catastrophic absolute numbers reported by @ivarbergman earlier in this thread and by @atifwaqar in #3494. They just seem really really off. Maybe I can do something there to help those guys better profile...
@dan31 commented on Thu May 05 2016
@alemoi, if what you guess will turn out to be a case, then we are bounced into a big need of making a more controlled compilation logic.
Here is a simple version of it. Namely: 1) When developers are in dev env, the LLVM compiler is set to fast compilation without strong optimisation. Result is fast dev cycle. Devs get their apps to run in a blink of an eye. 2) When developers put app in prod env, they turn a switch in DB conf file to let full-blown strongly optimised compilation for their code. Not suitable for dev cycle on dev machine, but need to switch on for load testing / prod use.
Actually, points 1 and 2 are no more than this:
In that sense, we are no special. Developers don't wait for a release version to be built every time they make a change, since it is expensive. Well, I'm not in favour of such manual switching, in my mind this is too 2000-sh, cumbersome (especially for our target audience willing to do super-fast and super-simple development), and there is a better approach.
Let me give you an analogy. For instance, in JVM, when you compile and run your code first time, it isn't yet strongly optimised (hence starts relatively fast). However, it gets better optimised over execution. The VM finds hot spots on the execution pathes and rebuilds them at moment while they aren't in use. That is why its VM is called HotSpot. Somewhat similar happens in modern SQL processing engines. They do not attempt to globally optimise the query when the user runs it first time (to avoid letting them wait a way too long), instead they optimise an execution plan during the actual execution of the query.
In our case, the state of affairs is even simpler. We might build compiled representations (to whatever - queries, URIs and so on) without strong optimisations at first place. Then, already when the app is started, we can start recompiling the representations with strongest possible optimisations. Think of it as two-level cache. While strongly optimised functions aren't yet in cache, the weakly-optimised are invoked. On the availability of strongly optimised representations (ostensibly which will actually take these seconds it takes them to compile now) the references in cache are (thread-safely) replaced to those. Hence you have no overhead of switching to them (simply you had one pointer, now you have another).
Thus, while now we (ostensibly - it needs to be verified) force-feed the user for seconds with longer compilation any time he starts an app, after the suggested addition this time will move to (a bit longer) warm-up phase, not noticable by an applied user in 99.9% of cases.
@per-samuelsson That is a good point. @atifwaqar, maybe you can spend some time to get us the startup issue reproduced? What do you think?
@atifwaqar commented on Fri May 06 2016
@dan31 Sure, I can help. maybe 1 way is to share the DB from my slow server? or whatever you guys suggest.
@dan31 commented on Fri May 06 2016
@atifwaqar Would be best if you can prepack your whole setup (use 7z Ultra compression, include all
Starcounter
folder fromDocuments
, all apps that you start and a batch file you use to start them) so that @per-samuelsson (or anyone like @alemoi) can reproduce the run you do to observe slow start. Henceforth we can use this laboratory as a base to measure impact of improvements.@atifwaqar commented on Fri May 06 2016
@dan31 Sure, the
Starcounter
folder fromDocuments
is 30+ GB. I am copying it to our SC Basement server, so your team can easily access it. I will update you as soon as its copied.@per-samuelsson commented on Sat May 07 2016
@dan31
I really like the ideas you present in that post. It gives us best-of-all: fast startup times, yet fast execution of critical code. Let's just hope it hold up, so it's not super-big or challenging in terms of implementation time.
@dan31 commented on Sat May 07 2016
There is an interesting alternative, tried it out today: https://github.com/GPUOpen-Tools/CodeXL/releases (ex-commercial AMD Profiler). It comes for free now and nicely integrates in VS. Yes it is optimised for AMD hardware, but it has nicely done sampling analysis (independent on CPU). Apart from all standard sampling data for native code, it also gives you assembly decomposition for each codeline, which is sometimes the most valuable part in the analysis as you can drill down deep:
@dan31 commented on Sat May 07 2016
Important to note, if @alemoi confirms what we think is a case, then this work is an absolute need to any future versions (not just a patch for existing stuff), since the dominance of llvm time will still prevail unless it is solved in smart caching manner.
@Starcounter-Jack commented on Sat May 07 2016
Just my 2 cents. If the below is already done, please disregard.
Making the LLVM faster?
Before implementing the cache, I suggest we try the following:
@dan31 commented on Sat May 07 2016
@Starcounter-Jack very good cents! Not currently sure of all them right now (probably missed something obvious while making llvm in-memory from disk-generated), but @alemoi will consider your notes top prio when he comes back on Tuesday. Yes, currently (since some time) stronger optimisations are on to produce faster functions. On Friday @alemoi tested only with optim flag set to false, resulting in 3.5 faster generations. We don't know yet of the balance gen time / run time. I believe during next week we are going to have all info at hand and most likely an issue being resolved.
@dan31 commented on Sat May 07 2016
@Starcounter-Jack No file system is involved, llvm is completely in-memory for all its users. Most likely the reason is optimisations, and maybe some details on the full jit chain.
@dan31 commented on Thu May 12 2016
I've shortly ran .NET profiler in-vivo on SWO. Their suite takes 30 sec. to "Starting a database...". Which is, as we've noticed already, isn't about database. The proper message there should be "Starting Starcounter..." but not "Starting a database...", in my opinion. First I noticed that half of a CPU time over these 30 secs. took
scnetworkgateway
. Next time I restarted a suite I promptly attached the profiler to it. Here are the results:Going deeper, it is clearly this:
My next candidate was
sccode.exe
, which also showed some peak on CPU usage approximately in the middle of the process, competing with the gateway for cpu resource. Here are the results of its profiling at a startup sequence:Needless to continue investigating, I believe.
I think the severity of this unfortunate imprudence of llvm module usage is now obvious. Hence I safely assign the issue to @alemoi and raise its priority to high. I realise it isn't that simple to fix, since the reason of it is clearly that you wanted to make llvm generate more performant results, which resulted into an issue which users don't buy in. I believe the best quick fix, which will prevent our current users from having such big delays in dev cycle and suffering from it, is to introduce a database setting in conf file with default "no optimisation", as @Starcounter-Jack suggests. Putting their stuff in production, they can swtich "stronger optimisations". The proper longer-term fix is to do a smart cache which gives unopt versions at first place and then humbly builds optimised versions transparently to the user. We need to make our users be comfortable with the technology, and enforcing them to wait for minutes in order to get runtime boost just every time isn't an option (don't know how much you'd gain, but most of them ain't doing high-frequency trading or another Facebook).
@per-samuelsson thank you for your deliberate examination, I believe the root of the problem is now clear enough. @per-samuelsson you could repeat the same procedure with @atifwaqar data file to observe which takes that long (perhaps exactly same thing).
@k-rus commented on Thu May 12 2016
Great investigation!
How bad is the performance of Starcounter if the optimisation is off? This quick fix might have drawback for new users, who download Starcounter and test performance. I doubt they will do proper configuration. They will probably run Starcounter as it is. Is it something we should worry about?
May be the switch by default should be on and then we teach our customers to switch it off in the development environment? Note that it is easy to forget to fix configuration in production environment.
@dan31 commented on Thu May 12 2016
Thank you @k-rus, It is very important point to track. I call us up to be very concise in such changes (and @alemoi in particular). The change itself is just a flag, but we need to clearly get here two sets of numbers as a table summary: 1) What is the decrease in startup time? In which examples and tests? (Empty db? TPC-C and which settings? RetailDemo? Customer case?) 2) What is the increase in runtime time? In which examples and tests? (Empty db? TPC-C and which settings? RetailDemo? Customer case?)
@dan31 commented on Thu May 12 2016
I would say like this. Substantially less than people downloading Starcounter and waiting for 6-10 secs at a first app start, not mentioning current customers struggling from minutes of startup. I also have a gut feeling that the decrease in perf won't be times, but probably 10-s of percents. For our published benchmarks, we turn on all perf tunings. Once smart cache is in place, this isn't a concern any more.
Naturally, when newbies do their ad-hoc microbenchmarks, they look at these steps:
1) Ease of setup and first coding experience (put a tick here, "Hello world" and transacted apps are very straightforward today, in well-recognised environments like Visual Studio);
2) Ease of dev cycle (coding-running-stopping-changing-...): needs improvement for startup time (current issue) and stopping/restarting time (addressed in Core .NET track). People don't like to wait, if they wait and the platform looks as if it is hangs big time, they would incline to leave the platform. There are more smaller points to address: e.g. precreate first database at an installation in the async way, but not when the user starts his first app.
3) Perf of writes. They populate the database with loads of data, like inserting millions of arbitrary generated rows like people and houses. People don't like to wait, if they need to wait for minutes to load their test data they would incline to leave the platform. Today you load 1.000.000 people of FirstName LastName in around 1.5 secs. on my laptop on a battery, surely can be improved but decently okay already (we as a company take great care about write performance levels).
4) Perf of reads. When the data is there, they want to crunch it. We all know what we want to solve there and how, I'm not diving into detail. Once NewSQL is in-place and redundant .net onion flakes around getting data to the codehost is removed, the reads would improve. Once we add OLAP support, the reads would amazingly improve.
5) Perf of mixed loads. Only quite savvy users and real customers approach this stage. We talk about concurrent access to the database and operational analytics example, when you want to do writes OLTP and analyse your current data in the OLAP way simultaneously. This is important when Starcounter gets really big public traction and people start to really stress-test it with amusing workloads.
Effectively, the point we address here is N.2 in the list. If N.2 isn't addressed, the amount of newbies trying out Starcounter but reaching N.3 and N.4 will greatly decrease.
@bigwad commented on Thu May 12 2016
@alemoi, could you please add appropriate scllvm.pdb to source control?
@dan31 commented on Thu May 12 2016
@bigwad +1
@dan31 commented on Thu May 12 2016
For the reasons I gave above, I think the most reasonable way to approach it is to provide the off flag by default and do work on smart cache asap and deprecate a flag. Our goal for Core .NET version is to achieve sub-second dev cycle for both start and stop, whilst it will be the same scllvm.dll involved into a loop. Hence, if kept as is and flag is on, it will never be possible to achieve sub-second experience.
However, it might turn out that the root reason isn't optimisation itself, but, for instance, heavy heap trashing or dealloc times per each function jitted when invoking llvm.dll. We don't know yet, @alemoi is for a rescue in the issue.
@k-rus commented on Thu May 12 2016
It is great to see this list being written down.
@dan31 commented on Thu May 12 2016
Actually this is only when there is a performance-related evaluation. The other big part of evaluation is the app dev framework, viewmodels with thin clients and all related to app development. When they try and test these (maybe it will be 70-80% of initial evaluation or even more, and this version we use in starting tutorials), compiling high-performant version is quite useless. However they might cope it with data generation and fetching (like trying to create analytical application with GUI), but this still goes before p. on writes and reads - they first create an app with no data, but with some GUI and events responses like button clicks or alike.
@dan31 commented on Thu May 12 2016
@k-rus @alemoi Another point, looking at a cap of 20% cpu load on a profiling session for both
sccode
andscnetworkgateway
I conclude that llvm gen is run on a single thread. Maybe running this in 4 threads or alike might improve it 1.5-2 times. However, this is still to be 10 secs instead of 20 secs.@alemoi commented on Thu May 12 2016
Sure. @eriohl, since you added the last scllvm.dll, do you still have the corresponding .pdb by chance?
@alemoi commented on Wed May 18 2016
So, after I added flags to disable optimizations for LLVM here are the results we have:
With optimizations on:
With optimizations off:
I used the same great
timeboot
from @per-samuelsson which measures warm start.So as we can see when it comes to URI matcher generator, the difference is about 3 times, which is good. However, the database configuration still takes 1.5 seconds! Instead of 2.2. Still a lot for just a database configuration step. I guess, we again have to dig deeper in database configuration native code.
@dan31 regarding your research, most probably its the CPU time that your tool measured, not the actual calendar time. LLVM eats CPU well because its a linear algorithmic code without syncronization and so on, that's why its shown to eat a lot of CPU.
I will now run TPCC without optimizations to see the difference in performance degradation.
@dan31 commented on Wed May 18 2016
@alemoi, one question, which app start do you actually measure? In my case this is a set of real apps, with lots of real URIs and queries. Regarding the sampling data, I was assuring visually that waiting was actually correlated to CPU work indicated on a diagram. Namely, these 20 seconds were not just something, but the CPU work capped at 25%, the work being concentrated in scllvm. I might be wrong, though I currently don't see any other particular source of slowness: I see CPU working while database is starting. Ha, actually, @alemoi, if you just send me a pdb file, I would gladly dig deeper on that matter. I can still tease these production machines. Can you do that?
@alemoi commented on Wed May 18 2016
For example, TPCC long test is taking 1:47sec VS 1:59sec.
I will turn off usage of llvm completely in codehost (fallback into a previous system with DLLs) and see how it goes with perf.
@dan31 commented on Wed May 18 2016
@alemoi that is also an interesting thing to try out. But really, point me to the pdbs, I'm eager to push it further on on these machines.
@alemoi commented on Wed May 18 2016
I need to build a new version of scllvm, pulling the sources now. Had to clean the hard-drive of not enough space. Will try to produce a new scllvm.dll/pdb tonight.
@alemoi commented on Wed May 18 2016
Here is what happens when I fall back into previous codegen DLLs usage:
so its seems to be an issue with LLVM. My guess there simply a difference in time for opt/nopt generation for different types of code: URI matcher and SQL code. That's why we don't see 3 times less time in case of SQL code.
@alemoi commented on Thu May 19 2016
I have recovered my llvm building environment, with their latest version 3.8.0. Of course, they have changed some interfaces and libraries :) I adopted the code and tested its basic work. Uploaded new scllvm.dll and pdb to
\\scbuildserver\ftp\temp\am\scllvm\
cc @bigwad if you are interested.
@per-samuelsson commented on Thu May 19 2016
Great work @alemoi. Looks like we are slowly (!) getting closer to improve on this. :+1:
@alemoi commented on Thu May 19 2016
I wonder if its possible to move this issue to a private repo. Its getting very internal here :)
@ivarbergman commented on Wed Sep 28 2016
HI! I just want to check with you if there are any progress (or new findings) on this issue? We are still suffering from slow database & application startup. About 30 sec for each app.
Some of our servers have 10 database with 15 apps each. Which means that restarting the services is a 1+ hour process.
We use Azure A2 instances and using starcounter version Starcounter-DevelopNightlyBuilds-2.2.1.3137 (Sep 13, 2016 2:43:12 AM)
Thanks!
@alemoi commented on Mon Oct 17 2016
To inform, in
develop
there is now a new working scllvm caching schema which should improve startup time. Hopefully soon it will be introduced in pnext as well.@dan31 commented on Mon Nov 14 2016
@ivarbergman could you please provide your typical start/re-start script for you set of apps? This is currently not related to compile times (those are actually shaved off with new llvm subsystem). Using your case, we need to investigate it further.
@ivarbergman commented on Tue Jan 10 2017
Hi! We have had input from @dan31 about how to do restart apps outside of this issue. (I.e. stop the sc host before we restart apps.) Thanks!
But we stil have rather slow start up even if we do this.
Locally (on my dev machine) I noticed that the Temp folder are very large.
Temp/weaver
8GB andTemp/sqlschemas
5GB and when I created a new database (reusing the sci-files) the Temp folders are much smaller and startup times was much better.Question: Do you think that large Temp folder can result in slower startup? And can the Temp folder be emptied safely?
@per-samuelsson commented on Tue Jan 10 2017
Sorry to hear that. Let's see if we can have some brainstorming session and try understand why.
I don't think it should, no.
In theory, it should be safe. But I have a vague memory where I once dropped some folders in there and it messed something up. There are issues in our backlog that describe stuff we should do to better keep them under control. Let me get back on that.
@ivarbergman commented on Wed Jan 11 2017
FYI: we are still running SC 2.2 but we will soon switch to 2.3. So maybe we shouldn't do too much investigations until we have made the upgrade.
@per-samuelsson commented on Wed Jan 11 2017
Was #1690 I was thinking of. Maybe we need to bump that.
@per-samuelsson commented on Wed Jan 11 2017
I see. Thanks for sharing.
Should be pretty easy to test if its safe to delete content under Temp at least. Create a new database, start desired apps in that. Then stop the database, delete everything under Temp (don't delete the
Temp\[Database]
folder in itself) and restart the apps. If that works, it's safe.Something like this:
@miyconst commented on Wed Jan 11 2017
@ivarbergman please try the StarDump.Migrator.2.2 app for this purpose.