Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.91k stars 542 forks source link

Severe regex/concatenation performance regression on Windows #21360

Closed xenu closed 1 year ago

xenu commented 1 year ago

This was originally reported on PerlMonks: https://www.perlmonks.org/index.pl?node_id=11153747

A performance regression introduced in Perl 5.28 causes this script to execute ~20x-40x slower on Windows:

use strict;
use warnings;
use feature 'say';
use Time::HiRes 'time';

my $x = 'a' x 1e6;
my $y = 'b' . $x;

my $t = time;
$x =~ s/./$&-/g;
say time - $t;

$t = time;
$y =~ s/./$&-/g;
say time - $t;

say $^V;

Here are the results:

12.7103209495544
0.850119113922119
v5.32.1

12.7171239852905
0.765337944030762
v5.28.2

0.345196962356567
0.314804077148438
v5.26.3

I can confirm the issue is still present in blead.

xenu commented 1 year ago

I played with Windows Performance Analyzer a bit and it seems that most of the time is spent inside Perl_pp_multiconcat (which was introduced in Perl 5.28):

wpa

More specifically, it seems most of the time is spent on reallocations inside Perl_tmps_grow_p. And indeed, the following (crude) patch makes the script execute ~5x faster on my machine:

diff --git a/scope.c b/scope.c
index 28c767f128..4af8c4ffe2 100644
--- a/scope.c
+++ b/scope.c
@@ -231,8 +231,8 @@ Perl_tmps_grow_p(pTHX_ SSize_t ix)
 {
     SSize_t extend_to = ix;
 #ifndef STRESS_REALLOC
-    if (ix - PL_tmps_max < 128)
-        extend_to += (PL_tmps_max < 512) ? 128 : 512;
+    if (ix - PL_tmps_max < 10000)
+        extend_to += 10000;
 #endif
     Renew(PL_tmps_stack, extend_to + 1, SV*);
     PL_tmps_max = extend_to + 1;

I suspect the difference between Windows and Linux is caused by the fact that Windows has a slower realloc. On Windows it's implemented naively (malloc + memcpy + free), while on Linux it avoids copies thanks to the mremap syscall.

Tux commented 1 year ago

Not only Windows. Time::Hires was added in 5.7.3, so I cannot use this test code before that

use strict;
use warnings;
use Time::HiRes "time";

my $x = "a" x 1e6;
my $y = "b" . $x;
my $t = time;
$x =~ s/./$&-/g;
printf "%8.4f\t", time - $t;

$t = time;
$y =~ s/./$&-/g;
printf "%8.4f\n", time - $t;

→ (x86_64-linux threaded and unthreaded)

T = Threaded, LD = LongDouble, QM = QuadMath
Running perl-all x21360.pl
5.006001         0.2142  0.2298
5.007002         0.1495  0.1577
5.007002 T/LD    0.1625  0.1770
5.007003         0.2203  0.2212
5.007003 T/LD    0.2296  0.2492
5.007003         0.2319  0.2283
5.007003 T/LD    0.2522  0.2531
5.008            0.1732  0.1746
5.008    T/LD    0.1868  0.1878
5.008001         0.1871  0.1922
5.008001 T/LD    0.1683  0.1805
5.008002         0.1895  0.1952
5.008002 T/LD    0.2026  0.2066
5.008003         0.1861  0.1877
5.008003 T/LD    0.2079  0.2068
5.008004         0.1855  0.1860
5.008004 T/LD    0.2022  0.2028
5.008005         0.1874  0.1887
5.008005 T/LD    0.2035  0.2030
5.008006         0.2069  0.1982
5.008006 T/LD    0.2258  0.2134
5.008007         0.2176  0.2127
5.008007 T/LD    0.2426  0.2356
5.008008         0.2138  0.2100
5.008008 T/LD    0.2251  0.2193
5.008009         0.2241  0.2150
5.008009 T/LD    0.2373  0.2287
5.008009         0.2230  0.2169
5.008009 T/LD    0.2379  0.2279
5.010000         0.2571  0.2590
5.010000 T/LD    0.2778  0.2764
5.010001         0.2996  0.2931
5.010001 T/LD    0.3312  0.3176
5.010001         0.2568  0.2654
5.010001 T/LD    0.3346  0.3280
5.011000         0.3698  0.3568
5.011000 T/LD    0.3857  0.3804
5.011001         0.3757  0.3570
5.011001 T/LD    0.3961  0.3862
5.011002         0.3771  0.3631
5.011002 T/LD    0.3355  0.3422
5.011003         0.3355  0.3237
5.011003 T/LD    0.3471  0.3424
5.011004         0.3383  0.3261
5.011004 T/LD    0.3524  0.3415
5.011005         0.3355  0.3255
5.011005 T/LD    0.3221  0.3319
5.011005         0.3203  0.3284
5.011005 T/LD    0.3498  0.3440
5.012000         0.3850  0.3746
5.012000 T/LD    0.3940  0.3806
5.012001         0.3747  0.3635
5.012001 T/LD    0.3904  0.3851
5.012002         0.3738  0.3631
5.012002 T/LD    0.3909  0.3781
5.012003         0.2989  0.3126
5.012003 T/LD    0.3398  0.3320
5.012004         0.3235  0.3138
5.012004 T/LD    0.3358  0.3302
5.012005         0.3201  0.3152
5.012005 T/LD    0.3407  0.3333
5.012005         0.3223  0.3140
5.012005 T/LD    0.3394  0.3303
5.013000         0.3217  0.3174
5.013000 T/LD    0.3370  0.3331
5.013001         0.3218  0.3270
5.013001 T/LD    0.3568  0.3474
5.013002         0.3408  0.3553
5.013002 T/LD    0.3857  0.3987
5.013003         0.3804  0.3685
5.013003 T/LD    0.4066  0.3942
5.013004         0.3677  0.3595
5.013004 T/LD    0.3880  0.3748
5.013005         0.3130  0.3178
5.013005 T/LD    0.3520  0.3431
5.013006         0.3360  0.3229
5.013006 T/LD    0.3543  0.3462
5.013007         0.3313  0.3229
5.013007 T/LD    0.3454  0.3364
5.013008         0.3286  0.3209
5.013008 T/LD    0.3450  0.3377
5.013009         0.3589  0.3552
5.013009 T/LD    0.3699  0.3695
5.013010         0.3501  0.3481
5.013010 T/LD    0.3674  0.3679
5.013011         0.3654  0.3632
5.013011 T/LD    0.3753  0.3740
5.013011         0.3589  0.3609
5.013011 T/LD    0.3826  0.3807
5.014000         0.3111  0.3296
5.014000 T/LD    0.3377  0.3550
5.014001         0.3094  0.3201
5.014001 T/LD    0.2965  0.3176
5.014002         0.3195  0.3372
5.014002 T/LD    0.2971  0.3241
5.014003         0.3064  0.3330
5.014003 T/LD    0.3364  0.3544
5.014004         0.2947  0.2936
5.014004 T/LD    0.3366  0.3536
5.014004         0.2799  0.3047
5.014004 T/LD    0.3531  0.3814
5.015000         0.2937  0.3203
5.015000 T/LD    0.3379  0.3568
5.015001         0.3074  0.3129
5.015001 T/LD    0.3416  0.3593
5.015002         0.3211  0.3324
5.015002 T/LD    0.3408  0.3600
5.015003         0.3162  0.3290
5.015003 T/LD    0.3359  0.3489
5.015004         0.3109  0.3199
5.015004 T/LD    0.3291  0.3377
5.015005         0.3169  0.3241
5.015005 T/LD    0.3354  0.3457
5.015006         0.2951  0.2950
5.015006 T/LD    0.3382  0.3562
5.015007         0.3018  0.3045
5.015007 T/LD    0.3377  0.3537
5.015008         0.3249  0.3368
5.015008 T/LD    0.3011  0.3219
5.015009         0.3066  0.3097
5.015009 T/LD    0.3390  0.3576
5.015009         0.3523  0.3513
5.015009 T/LD    0.3693  0.3717
5.016000         0.3037  0.3036
5.016000 T/LD    0.3369  0.3534
5.016001         0.3256  0.3449
5.016001 T/LD    0.3519  0.3614
5.016002         0.3312  0.3436
5.016002 T/LD    0.3436  0.3536
5.016003         0.3191  0.3320
5.016003 T/LD    0.3131  0.3565
5.016003         0.2979  0.3141
5.016003 T/LD    0.3244  0.3232
5.017000         0.3340  0.3455
5.017000 T/LD    0.3380  0.3546
5.017001         0.3199  0.3339
5.017001 T/LD    0.3459  0.3644
5.017002         0.3188  0.3333
5.017002 T/LD    0.3081  0.3107
5.017003         0.3230  0.3337
5.017003 T/LD    0.3384  0.3534
5.017004         0.3245  0.3380
5.017004 T/LD    0.3251  0.3411
5.017005         0.3059  0.3055
5.017005 T/LD    0.3177  0.3182
5.017006         0.3130  0.3176
5.017006 T/LD    0.3474  0.3617
5.017007         0.3400  0.3491
5.017007 T/LD    0.3494  0.3595
5.017008         0.3220  0.3259
5.017008 T/LD    0.3673  0.3840
5.017009         0.3369  0.3542
5.017009 T/LD    0.3652  0.3828
5.017010         0.3264  0.3429
5.017010 T/LD    0.3456  0.3682
5.017011         0.3240  0.3408
5.017011 T/LD    0.3385  0.3535
5.017011         0.3125  0.3312
5.017011 T/LD    0.3474  0.3466
5.018000         0.3365  0.3469
5.018000 T/LD    0.3229  0.3287
5.018001         0.3150  0.3208
5.018001 T/LD    0.3377  0.3457
5.018002         0.3359  0.3566
5.018002 T/LD    0.3526  0.3715
5.018003         0.3075  0.3105
5.018003 T/LD    0.3259  0.3626
5.018004         0.3366  0.3541
5.018004 T/LD    0.3520  0.3714
5.018004         0.3054  0.3207
5.018004 T/LD    0.3402  0.3402
5.019000         0.3154  0.3235
5.019000 T/LD    0.3256  0.3347
5.019001         0.3357  0.3527
5.019001 T/LD    0.3568  0.3753
5.019002         0.3198  0.3179
5.019002 T/LD    0.3415  0.3451
5.019003         0.3384  0.3444
5.019003 T/LD    0.3584  0.3767
5.019004         0.3050  0.3067
5.019004 T/LD    0.3337  0.3364
5.019005         0.3101  0.3089
5.019005 T/LD    0.3260  0.3226
5.019006         0.3024  0.2972
5.019006 T/LD    0.3447  0.3492
5.019007         0.3019  0.3099
5.019007 T/LD    0.3366  0.3484
5.019008         0.3077  0.3157
5.019008 T/LD    0.3423  0.3611
5.019009         0.3105  0.3127
5.019009 T/LD    0.3325  0.3360
5.019010         0.2914  0.3098
5.019010 T/LD    0.3284  0.3272
5.019011         0.3124  0.3135
5.019011 T/LD    0.3264  0.3320
5.019011         0.3005  0.3148
5.019011 T/LD    0.3332  0.3336
5.020000         0.3105  0.3180
5.020000 T/LD    0.3507  0.3745
5.020001         0.3343  0.3498
5.020001 T/LD    0.3291  0.3299
5.020002         0.3075  0.3077
5.020002 T/LD    0.3453  0.3570
5.020003         0.3121  0.3144
5.020003 T/LD    0.3283  0.3367
5.020003         0.3570  0.3656
5.020003 T/LD    0.3849  0.3863
5.021000         0.3131  0.3128
5.021000 T/LD    0.3283  0.3345
5.021001         0.3177  0.3188
5.021001 T/LD    0.3397  0.3744
5.021002         0.3278  0.3278
5.021002 T/LD    0.3858  0.3978
5.021003         0.3339  0.3490
5.021003 T/LD    0.3462  0.3657
5.021004         0.3342  0.3464
5.021004 T/LD    0.3529  0.3685
5.021005         0.3034  0.3346
5.021005 T/LD    0.3474  0.3593
5.021006         0.3325  0.3476
5.021006 T/LD    0.3306  0.3345
5.021007         0.3793  0.3948
5.021007 T/LD    0.3452  0.3605
5.021008         0.3325  0.3458
5.021008 T/LD    0.3494  0.3676
5.021009         0.3044  0.3280
5.021009 T/LD    0.3482  0.3721
5.021010         0.3184  0.3383
5.021010 T/LD    0.3412  0.3617
5.021011         0.3248  0.3360
5.021011 T/LD    0.3340  0.3447
5.021011         0.3099  0.3290
5.021011 T/LD    0.3417  0.3409
5.022000         0.3349  0.3325
5.022000 T/LD    0.3352  0.3469
5.022001         0.3133  0.3448
5.022001 T/LD    0.3311  0.3327
5.022002         0.3048  0.3066
5.022002 T/LD    0.3597  0.3809
5.022003         0.3291  0.3403
5.022003 T/LD    0.3472  0.3697
5.022004         0.2983  0.2941
5.022004 T/LD    0.3183  0.3223
5.022004         0.3360  0.3523
5.022004 T/LD    0.3204  0.3342
5.023000         0.3025  0.3063
5.023000 T/LD    0.3219  0.3248
5.023001         0.3092  0.3384
5.023001 T/LD    0.3555  0.3784
5.023002         0.3194  0.3347
5.023002 T/LD    0.3465  0.3697
5.023003         0.3223  0.3303
5.023003 T/LD    0.3332  0.3465
5.023004         0.3158  0.3247
5.023004 T/LD    0.3172  0.3192
5.023005         0.3245  0.3316
5.023005 T/LD    0.3098  0.3325
5.023006         0.3254  0.3442
5.023006 T/LD    0.3536  0.3731
5.023007         0.3213  0.3364
5.023007 T/LD    0.3087  0.3266
5.023008         0.3138  0.3249
5.023008 T/LD    0.3400  0.3559
5.023009         0.3058  0.3117
5.023009 T/LD    0.3475  0.3662
5.023009         0.3044  0.3197
5.023009 T/LD    0.3875  0.3910
5.024000         0.3270  0.3449
5.024000 T/LD    0.3586  0.3758
5.024001         0.3280  0.3423
5.024001 T/LD    0.3516  0.3670
5.024002         0.3203  0.3340
5.024002 T/LD    0.3395  0.3577
5.024003         0.3173  0.3302
5.024003 T/LD    0.3291  0.3427
5.024004         0.2994  0.3016
5.024004 T/LD    0.3247  0.3305
5.024004         0.3076  0.3329
5.024004 T/LD    0.3627  0.3628
5.025000         0.3073  0.3071
5.025000 T/LD    0.3338  0.3397
5.025001         0.3044  0.3013
5.025001 T/LD    0.3290  0.3292
5.025002         0.3040  0.3202
5.025002 T/LD    0.3487  0.3731
5.025003         0.3294  0.3435
5.025003 T/LD    0.3277  0.3267
5.025004         0.3178  0.3300
5.025004 T/LD    0.3509  0.3662
5.025005         0.3179  0.3269
5.025005 T/LD    0.3227  0.3222
5.025006         0.3049  0.2994
5.025006 T/LD    0.3217  0.3220
5.025007         0.3037  0.2982
5.025007 T/LD    0.3400  0.3540
5.025008         0.3029  0.3057
5.025008 T/LD    0.3368  0.3532
5.025009         0.3025  0.3067
5.025009 T/LD    0.3246  0.3285
5.025010         0.3196  0.3376
5.025010 T/LD    0.3504  0.3710
5.025011         0.3077  0.3097
5.025011 T/LD    0.3496  0.3664
5.025012         0.3239  0.3379
5.025012 T/LD    0.3457  0.3656
5.025012         0.3051  0.3203
5.025012 T/LD    0.3526  0.3530
5.026000         0.3112  0.3150
5.026000 T/LD    0.3515  0.3657
5.026001         0.3151  0.3287
5.026001 T/LD    0.3372  0.3534
5.026002         0.3053  0.3063
5.026002 T/LD    0.3194  0.3259
5.026003         0.3020  0.3027
5.026003 T/LD    0.3351  0.3547
5.026003         0.3434  0.3440
5.026003 T/LD    0.3363  0.3608
5.027000         0.3229  0.3507
5.027000 T/LD    0.3220  0.3241
5.027001         0.3208  0.3376
5.027001 T/LD    0.3152  0.3171
5.027002         0.3002  0.3012
5.027002 T/LD    0.3458  0.3628
5.027003         0.2984  0.2994
5.027003 T/LD    0.3375  0.3589
5.027004         0.3015  0.3038
5.027004 T/LD    0.3749  0.3944
5.027005         0.3212  0.3332
5.027005 T/LD    0.3408  0.3626
5.027006         0.2891  0.2863
5.027006 T/LD    0.3227  0.3300
5.027007         0.3109  0.3270
5.027007 T/LD    0.3329  0.3535
5.027008         0.2980  0.3085
5.027008 T/LD    0.3309  0.3502
5.027009         0.5743  0.5089
5.027009 T/LD    0.7061  0.6282
5.027010         0.6032  0.5273
5.027010 T/LD    0.6751  0.5967
5.027011         0.6061  0.5270
5.027011 T/LD    0.6762  0.5998
5.027011         0.6982  0.5960
5.027011 T/LD    0.7923  0.6837
5.028000         0.6216  0.5688
5.028000 T/LD    0.7364  0.6733
5.028001         0.6057  0.5595
5.028001 T/LD    0.6718  0.5895
5.028002 T/LD    0.6828  0.6199
5.028002         0.5848  0.5225
5.028002 T/LD    0.6949  0.6343
5.028003         0.6411  0.5823
5.028003 T/LD    0.6687  0.5955
5.028002 T/LD    0.7972  0.7022
5.028003         0.6207  0.5623
5.028003 T/LD    0.7164  0.6419
5.029000         0.5836  0.5248
5.029000 T/LD    0.7156  0.6328
5.029001 T/LD    0.7174  0.6894
5.029001         0.6295  0.5806
5.029001 T/LD    0.7010  0.6332
5.029002         0.5659  0.5239
5.029002 T/LD    0.6680  0.5941
5.029003         0.6273  0.6195
5.029003 T/LD    0.7448  0.6664
5.029004         0.6605  0.5952
5.029004 T/LD    0.7254  0.6594
5.029005         0.6322  0.5793
5.029005 T/LD    0.7265  0.6562
5.029006         0.6243  0.5793
5.029006 T/LD    0.7224  0.6564
5.029007         0.6112  0.5543
5.029007 T/LD    0.7009  0.6279
5.029008         0.6180  0.5547
5.029008 T/LD    0.6795  0.5748
5.029009         0.6293  0.5644
5.029009 T/LD    0.7109  0.6001
5.029010         0.6016  0.5298
5.029010 T/LD    0.6877  0.5771
5.029001 T/LD    0.7611  0.6653
5.029010         0.6604  0.5673
5.029010 T/LD    0.7079  0.6102
5.030000         0.6039  0.5303
5.030000 T/LD    0.7196  0.6386
5.030001         0.6473  0.5792
5.030001 T/LD    0.7475  0.6435
5.030002         0.6319  0.5680
5.030002 T/LD    0.7777  0.6620
5.030003         0.6335  0.5986
5.030003 T/LD    0.6957  0.5950
5.030003         0.6619  0.5743
5.030003 T/LD    0.7833  0.6524
5.031000         0.6028  0.5308
5.031000 T/LD    0.7324  0.6384
5.031001         0.6372  0.5693
5.031001 T/LD    0.7410  0.6440
5.031002         0.6240  0.5724
5.031002 T/LD    0.7005  0.6459
5.031003         0.6446  0.5670
5.031003 T/LD    0.7641  0.6813
5.031004         0.6530  0.6000
5.031004 T/LD    0.7756  0.6779
5.031005         0.6617  0.5993
5.031005 T/LD    0.6919  0.6390
5.031006         0.6421  0.5751
5.031006 T/LD    0.7545  0.6402
5.031007         0.6436  0.5682
5.031007 T/LD    0.6846  0.5909
5.031008         0.6079  0.5383
5.031008 T/LD    0.7767  0.6764
5.031009         0.5813  0.5204
5.031009 T/LD    0.7699  0.6772
5.031010         0.9169  0.8621
5.031010 T/LD    1.0245  0.9286
5.031011         0.8421  0.7805
5.031011 T/LD    0.9652  0.8779
5.031011         0.9216  0.8346
5.031011 T/LD    1.0618  0.9316
5.032000         0.8218  0.7559
5.032000 T/LD    1.0617  0.9685
5.032001         0.8014  0.7308
5.032001 T/LD    0.9520  0.8584
5.032000         0.8442  0.7603
5.032000 T/LD    0.9883  0.8668
5.033000         0.9079  0.8521
5.033000 T/LD    1.0377  0.9343
5.033001         0.8726  0.8176
5.033001 T/LD    0.9939  0.8973
5.033002         0.8699  0.8165
5.033002 T/LD    1.0481  0.9620
5.033003         0.8752  0.8077
5.033003 T/LD    1.0017  0.9719
5.033004         0.9128  0.8610
5.033004 T/LD    0.9891  0.9124
5.033005         0.8871  0.8151
5.033005 T/LD    1.0118  0.9246
5.033006         0.8262  0.7608
5.033006 T/LD    1.0464  0.9584
5.033007         0.6545  0.5957
5.033007 T/LD    0.7216  0.6388
5.033008         0.6522  0.5961
5.033008 T/LD    0.7688  0.6797
5.033009         0.5769  0.5172
5.033009 T/LD    0.7423  0.6453
5.033003         0.8474  0.7628
5.033003 T/LD    1.1179  0.9788
5.034000         0.6495  0.5931
5.034000 T/LD    0.7665  0.6811
5.034001         0.6268  0.5677
5.034001 T/LD    0.7078  0.6371
5.035001         0.6494  0.5961
5.035001 T/LD    0.7697  0.6769
5.035002         0.6652  0.6075
5.035002 T/LD    0.6983  0.5894
5.035003         0.5930  0.5122
5.035003 T/LD    0.7005  0.5869
5.035004         0.6426  0.5654
5.035004 T/LD    0.7060  0.6003
5.035005         0.6039  0.5350
5.035005 T/LD    0.6984  0.5975
5.035006         0.6273  0.5586
5.035006 T/LD    0.7287  0.6301
5.035007         0.6539  0.5932
5.035007 T/LD    0.6739  0.5658
5.035008         0.6121  0.5386
5.035008 T/LD    0.6882  0.5971
5.035009         0.6403  0.5684
5.035009 T/LD    0.6641  0.5679
5.035010         0.6270  0.5558
5.035010 T/LD    0.6413  0.5537
5.035011         0.6173  0.5717
5.035011 T/LD    0.7013  0.6155
5.036000         0.6252  0.5569
5.036000 T/LD    0.6772  0.5808
5.036001         0.6271  0.5580
5.036001 T/LD    0.6856  0.5762
5.037000         0.6295  0.5567
5.037000 T/LD    0.6839  0.5772
5.037001         0.6370  0.5648
5.037001 T/LD    0.6443  0.5318
5.037002         0.6313  0.5605
5.037002 T/LD    0.6761  0.5839
5.037003         0.6118  0.5229
5.037003 T/LD    0.6872  0.5851
5.037004         0.5927  0.5190
5.037004 T/LD    0.6835  0.5835
5.037005         0.5893  0.5170
5.037005 T/LD    0.7010  0.6101
5.037006         0.6673  0.6062
5.037006 T/LD    0.6956  0.5848
5.037007         0.5831  0.5221
5.037007 T/LD    0.7149  0.6237
5.037008         0.6346  0.5700
5.037008 T/LD    0.6823  0.5932
5.037009         0.6229  0.5515
5.037009 T/LD    0.6828  0.5836
5.037010         0.6727  0.6029
5.037010 T/LD    0.6713  0.5880
5.037011         0.6183  0.5516
5.037011 T/LD    0.7146  0.6174
5.038000         0.6349  0.5780
5.038000 T/LD    0.6918  0.5836
5.038000 T       0.6069  0.5219
5.038000 T/QM    0.8000  0.6709
Tux commented 1 year ago

x21360

demerphq commented 1 year ago

On Thu, 10 Aug 2023, 10:28 xenu, @.***> wrote:

I played with Windows Performance Analyzer a bit and it seems that most of the time is spent inside Perl_pp_multiconcat (which was introduced in Perl 5.28):

[image: wpa] https://user-images.githubusercontent.com/352038/259676065-ca48ea0d-d082-48f6-871b-5b0cde7a7840.png

More specifically, it seems most of the time is spent on reallocations inside Perl_tmps_grow_p. And indeed, the following (crude) patch makes the script execute ~5x faster on my machine:

diff --git a/scope.c b/scope.c index 28c767f128..4af8c4ffe2 100644--- a/scope.c+++ b/scope.c@@ -231,8 +231,8 @@ Perl_tmps_growp(pTHX SSize_t ix) { SSize_t extend_to = ix;

ifndef STRESS_REALLOC- if (ix - PL_tmps_max < 128)- extend_to += (PL_tmps_max < 512) ? 128 : 512;+ if (ix - PL_tmps_max < 10000)+ extend_to += 10000;

endif

 Renew(PL_tmps_stack, extend_to + 1, SV*);
 PL_tmps_max = extend_to + 1;

I suspect the difference between Windows and Linux is caused by the fact that Windows has a slower realloc(). On Windows it's implemented naively (malloc + memcpy + free), while on Linux it avoids copies thanks to the mremap syscall

Yep. This has come up many a time in the internals.

Yves

richardleach commented 1 year ago

I'm just trying to understand what's going on here and doing a brain dump.

The SVs (with the a's in them) being processed by Perl_pp_multiconcat look like this:

SV = PVMG(0x559a66cfb750) at 0x559a66d0ae00
  REFCNT = 1
  FLAGS = (GMG,SMG,POK,pPOK)
  IV = 0
  NV = 0
  PV = 0x559a66d14d40 "a"\0
  CUR = 1
  LEN = 16
  MAGIC = 0x559a66d14df0
    MG_VIRTUAL = &PL_vtbl_sv
    MG_TYPE = PERL_MAGIC_sv(\0)
    MG_OBJ = 0x559a66d0ade8

Stepping through in blead, because the GMG flag is set, this early branch is taken in "Phase 1" of pp_multiconcat (https://github.com/Perl/perl5/blob/blead/pp_hot.c#L620):

else if (UNLIKELY(SvFLAGS(sv) & (SVs_GMG|SVf_ROK))) {
    goto do_magical;

This takes us to "Phase 7", where for every a, a mortal SV is created via the bottom branch here (https://github.com/Perl/perl5/blob/blead/pp_hot.c#L1113):

            /* get the next arg SV or regen the next const SV */
            len = lens[i >> 1].ssize;
            if (i == n) {
                /* handle the final targ .= (....) */
                right = left;
                left = targ;
            }
            else if (i & 1)
                right = svp[(i >> 1)];
            else if (len < 0)
                continue; /* no const in this position */
            else {
                right = newSVpvn_flags(cpv, len, (utf8 | SVs_TEMP));
                cpv += len;
            }

and another one shortly thereafter in this branch (https://github.com/Perl/perl5/blob/blead/pp_hot.c#L1128):

            if (arg_count == 2 && i < n) {
                /* for the first concat, create a mortal acting like the
                 * padtmp from OP_CONST. In later iterations this will
                 * be appended to */
                nexttarg = sv_newmortal();
                nextappend = FALSE;
            }

In comparison, when the multiconcat optimization is disabled in the peephole optimizer to allow concat to remain in the optree, all these mortal SVs are not created.

--- a/peep.c
+++ b/peep.c
@@ -295,7 +295,7 @@ S_maybe_multiconcat(pTHX_ OP *o)

     /* first see if, at the top of the tree, there is an assign,
      * append and/or stringify */
-
+return;
     if (topop->op_type == OP_SASSIGN) {
         /* expr = ..... */
         if (o->op_ppaddr != PL_ppaddr[OP_SASSIGN])

concat only considers the magic of its operands in tryAMAGICbin_MG(concat_amg, AMGf_assign); which - at a really quick glance - doesn't seem to end up doing anything significant.

So I guess pp_multiconcat is being prematurely pessimistic and it might be possible for it to look at exactly what magic method is set before jumping to do_magical?

richardleach commented 1 year ago

(or maybe still jump to Phase 7, but do things a bit differently there to avoid creating all the mortal SVs)

richardleach commented 1 year ago

In the test case from above, pp_multiconcat's call to sv_newmortal is to get a mortal SV to hold the final concatenated string. Since the op's op_targ points to a (presumable its own) PADTMP, it seems like we could just use that instead?

diff --git a/pp_hot.c b/pp_hot.c
index 3ec24a0e23..077aa248e2 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -1122,10 +1122,15 @@ PP(pp_multiconcat)
             }

             if (arg_count == 2 && i < n) {
-                /* for the first concat, create a mortal acting like the
-                 * padtmp from OP_CONST. In later iterations this will
-                 * be appended to */
-                nexttarg = sv_newmortal();
+                if (SvPADTMP(targ)) { /* Reuse current pad SV */
+                    SvPVCLEAR(targ);
+                    nexttarg = targ;
+                } else {
+                    /* for the first concat, create a mortal acting like the
+                     * padtmp from OP_CONST. In later iterations this will
+                     * be appended to */
+                    nexttarg = sv_newmortal();
+                }
                 nextappend = FALSE;
             }
             else {

(make test doesn't fail with that on a non-debugging build. Doing a debugging build now.)

I don't know if the interpreter could be in a situation where it could create a load of mortals and op_targ not point to a PADTMP... If this could happen, perhaps the peephole optimizer could allocate an extra pad slot for the multiconcat op, which it can (re)use for this purpose.

The newSVpvn_flags(cpv, len, (utf8 | SVs_TEMP)) is necessary because there is no persistent SV containing the '-' character, it's just something in a char buffer attached to the multiconcat op. I'm not sure what the best approach for reducing mortals here is. Possibilities:

I'm not sure where extra pad index(es) could be stored - probably in the aux buf?

richardleach commented 1 year ago
  • The optimization is changed so that CONST ops aren't pulled into the cpv buffer. Instead, the peephole optimizer records their SV* and pp_multiconcat grabs the value directly out of the CONST ops when it needs it. (This option seems messier and might reduce performance for concats with all non-magical SVs.)

Duh. It would probably be better to completely swipe the SVs and free the CONST ops.

richardleach commented 1 year ago
  • The peephole optimizer could allocate an extra pad slot for the multiconcat op, which it can (re)use for this purpose.

My naive attempt at trying this has fallen foul of this test from t/op/concat2.t:

package RT132385 {
    my @a;
    use overload '.' => sub { push @a, \$_[1]; $_[0] };
    my $o = bless [];
    my $x = $o . "A" . $o . 'B';
    ::is "${$a[0]}${$a[2]}", "AB", "RT #132385";
}
richardleach commented 1 year ago
  • The peephole optimizer could allocate an extra pad slot for the multiconcat op, which it can (re)use for this purpose.

My naive attempt at trying this has fallen foul of this test from t/op/concat2.t:

So I think the alternative to blowing out the tmps stack might be for the op to maintain it's own tmps stack just for the duration of the op execution, and free the tmps on it just before the op completes?

iabyn commented 1 year ago

On Sun, Aug 13, 2023 at 02:48:09PM -0700, Richard Leach wrote:

  • The peephole optimizer could allocate an extra pad slot for the multiconcat op, which it can (re)use for this purpose.

My naive attempt at trying this has fallen foul of this test from t/op/concat2.t:

So I think the alternative to blowing out the tmps stack might be for the op to maintain it's own tmps stack just for the duration of the op execution, and free the tmps on it just before the op completes?

pp_multiconcat() is my baby. There are lots of subtleties and complications, such as op_targ not always being a PADTMP.

I'll look into this when I get some time.

-- The optimist believes that he lives in the best of all possible worlds. As does the pessimist.

richardleach commented 1 year ago

pp_multiconcat() is my baby. There are lots of subtleties and complications, such as op_targ not always being a PADTMP. I'll look into this when I get some time.

Cool. No probs.

iabyn commented 1 year ago

Fixed by v5.39.2-48-g7bcd18ae82