Closed xenu closed 1 year ago
I played with Windows Performance Analyzer a bit and it seems that most of the time is spent inside Perl_pp_multiconcat
(which was introduced in Perl 5.28):
More specifically, it seems most of the time is spent on reallocations inside Perl_tmps_grow_p
. And indeed, the following (crude) patch makes the script execute ~5x faster on my machine:
diff --git a/scope.c b/scope.c
index 28c767f128..4af8c4ffe2 100644
--- a/scope.c
+++ b/scope.c
@@ -231,8 +231,8 @@ Perl_tmps_grow_p(pTHX_ SSize_t ix)
{
SSize_t extend_to = ix;
#ifndef STRESS_REALLOC
- if (ix - PL_tmps_max < 128)
- extend_to += (PL_tmps_max < 512) ? 128 : 512;
+ if (ix - PL_tmps_max < 10000)
+ extend_to += 10000;
#endif
Renew(PL_tmps_stack, extend_to + 1, SV*);
PL_tmps_max = extend_to + 1;
I suspect the difference between Windows and Linux is caused by the fact that Windows has a slower realloc
. On Windows it's implemented naively (malloc
+ memcpy
+ free
), while on Linux it avoids copies thanks to the mremap
syscall.
Not only Windows. Time::Hires was added in 5.7.3, so I cannot use this test code before that
use strict;
use warnings;
use Time::HiRes "time";
my $x = "a" x 1e6;
my $y = "b" . $x;
my $t = time;
$x =~ s/./$&-/g;
printf "%8.4f\t", time - $t;
$t = time;
$y =~ s/./$&-/g;
printf "%8.4f\n", time - $t;
→ (x86_64-linux threaded and unthreaded)
T = Threaded, LD = LongDouble, QM = QuadMath
Running perl-all x21360.pl
5.006001 0.2142 0.2298
5.007002 0.1495 0.1577
5.007002 T/LD 0.1625 0.1770
5.007003 0.2203 0.2212
5.007003 T/LD 0.2296 0.2492
5.007003 0.2319 0.2283
5.007003 T/LD 0.2522 0.2531
5.008 0.1732 0.1746
5.008 T/LD 0.1868 0.1878
5.008001 0.1871 0.1922
5.008001 T/LD 0.1683 0.1805
5.008002 0.1895 0.1952
5.008002 T/LD 0.2026 0.2066
5.008003 0.1861 0.1877
5.008003 T/LD 0.2079 0.2068
5.008004 0.1855 0.1860
5.008004 T/LD 0.2022 0.2028
5.008005 0.1874 0.1887
5.008005 T/LD 0.2035 0.2030
5.008006 0.2069 0.1982
5.008006 T/LD 0.2258 0.2134
5.008007 0.2176 0.2127
5.008007 T/LD 0.2426 0.2356
5.008008 0.2138 0.2100
5.008008 T/LD 0.2251 0.2193
5.008009 0.2241 0.2150
5.008009 T/LD 0.2373 0.2287
5.008009 0.2230 0.2169
5.008009 T/LD 0.2379 0.2279
5.010000 0.2571 0.2590
5.010000 T/LD 0.2778 0.2764
5.010001 0.2996 0.2931
5.010001 T/LD 0.3312 0.3176
5.010001 0.2568 0.2654
5.010001 T/LD 0.3346 0.3280
5.011000 0.3698 0.3568
5.011000 T/LD 0.3857 0.3804
5.011001 0.3757 0.3570
5.011001 T/LD 0.3961 0.3862
5.011002 0.3771 0.3631
5.011002 T/LD 0.3355 0.3422
5.011003 0.3355 0.3237
5.011003 T/LD 0.3471 0.3424
5.011004 0.3383 0.3261
5.011004 T/LD 0.3524 0.3415
5.011005 0.3355 0.3255
5.011005 T/LD 0.3221 0.3319
5.011005 0.3203 0.3284
5.011005 T/LD 0.3498 0.3440
5.012000 0.3850 0.3746
5.012000 T/LD 0.3940 0.3806
5.012001 0.3747 0.3635
5.012001 T/LD 0.3904 0.3851
5.012002 0.3738 0.3631
5.012002 T/LD 0.3909 0.3781
5.012003 0.2989 0.3126
5.012003 T/LD 0.3398 0.3320
5.012004 0.3235 0.3138
5.012004 T/LD 0.3358 0.3302
5.012005 0.3201 0.3152
5.012005 T/LD 0.3407 0.3333
5.012005 0.3223 0.3140
5.012005 T/LD 0.3394 0.3303
5.013000 0.3217 0.3174
5.013000 T/LD 0.3370 0.3331
5.013001 0.3218 0.3270
5.013001 T/LD 0.3568 0.3474
5.013002 0.3408 0.3553
5.013002 T/LD 0.3857 0.3987
5.013003 0.3804 0.3685
5.013003 T/LD 0.4066 0.3942
5.013004 0.3677 0.3595
5.013004 T/LD 0.3880 0.3748
5.013005 0.3130 0.3178
5.013005 T/LD 0.3520 0.3431
5.013006 0.3360 0.3229
5.013006 T/LD 0.3543 0.3462
5.013007 0.3313 0.3229
5.013007 T/LD 0.3454 0.3364
5.013008 0.3286 0.3209
5.013008 T/LD 0.3450 0.3377
5.013009 0.3589 0.3552
5.013009 T/LD 0.3699 0.3695
5.013010 0.3501 0.3481
5.013010 T/LD 0.3674 0.3679
5.013011 0.3654 0.3632
5.013011 T/LD 0.3753 0.3740
5.013011 0.3589 0.3609
5.013011 T/LD 0.3826 0.3807
5.014000 0.3111 0.3296
5.014000 T/LD 0.3377 0.3550
5.014001 0.3094 0.3201
5.014001 T/LD 0.2965 0.3176
5.014002 0.3195 0.3372
5.014002 T/LD 0.2971 0.3241
5.014003 0.3064 0.3330
5.014003 T/LD 0.3364 0.3544
5.014004 0.2947 0.2936
5.014004 T/LD 0.3366 0.3536
5.014004 0.2799 0.3047
5.014004 T/LD 0.3531 0.3814
5.015000 0.2937 0.3203
5.015000 T/LD 0.3379 0.3568
5.015001 0.3074 0.3129
5.015001 T/LD 0.3416 0.3593
5.015002 0.3211 0.3324
5.015002 T/LD 0.3408 0.3600
5.015003 0.3162 0.3290
5.015003 T/LD 0.3359 0.3489
5.015004 0.3109 0.3199
5.015004 T/LD 0.3291 0.3377
5.015005 0.3169 0.3241
5.015005 T/LD 0.3354 0.3457
5.015006 0.2951 0.2950
5.015006 T/LD 0.3382 0.3562
5.015007 0.3018 0.3045
5.015007 T/LD 0.3377 0.3537
5.015008 0.3249 0.3368
5.015008 T/LD 0.3011 0.3219
5.015009 0.3066 0.3097
5.015009 T/LD 0.3390 0.3576
5.015009 0.3523 0.3513
5.015009 T/LD 0.3693 0.3717
5.016000 0.3037 0.3036
5.016000 T/LD 0.3369 0.3534
5.016001 0.3256 0.3449
5.016001 T/LD 0.3519 0.3614
5.016002 0.3312 0.3436
5.016002 T/LD 0.3436 0.3536
5.016003 0.3191 0.3320
5.016003 T/LD 0.3131 0.3565
5.016003 0.2979 0.3141
5.016003 T/LD 0.3244 0.3232
5.017000 0.3340 0.3455
5.017000 T/LD 0.3380 0.3546
5.017001 0.3199 0.3339
5.017001 T/LD 0.3459 0.3644
5.017002 0.3188 0.3333
5.017002 T/LD 0.3081 0.3107
5.017003 0.3230 0.3337
5.017003 T/LD 0.3384 0.3534
5.017004 0.3245 0.3380
5.017004 T/LD 0.3251 0.3411
5.017005 0.3059 0.3055
5.017005 T/LD 0.3177 0.3182
5.017006 0.3130 0.3176
5.017006 T/LD 0.3474 0.3617
5.017007 0.3400 0.3491
5.017007 T/LD 0.3494 0.3595
5.017008 0.3220 0.3259
5.017008 T/LD 0.3673 0.3840
5.017009 0.3369 0.3542
5.017009 T/LD 0.3652 0.3828
5.017010 0.3264 0.3429
5.017010 T/LD 0.3456 0.3682
5.017011 0.3240 0.3408
5.017011 T/LD 0.3385 0.3535
5.017011 0.3125 0.3312
5.017011 T/LD 0.3474 0.3466
5.018000 0.3365 0.3469
5.018000 T/LD 0.3229 0.3287
5.018001 0.3150 0.3208
5.018001 T/LD 0.3377 0.3457
5.018002 0.3359 0.3566
5.018002 T/LD 0.3526 0.3715
5.018003 0.3075 0.3105
5.018003 T/LD 0.3259 0.3626
5.018004 0.3366 0.3541
5.018004 T/LD 0.3520 0.3714
5.018004 0.3054 0.3207
5.018004 T/LD 0.3402 0.3402
5.019000 0.3154 0.3235
5.019000 T/LD 0.3256 0.3347
5.019001 0.3357 0.3527
5.019001 T/LD 0.3568 0.3753
5.019002 0.3198 0.3179
5.019002 T/LD 0.3415 0.3451
5.019003 0.3384 0.3444
5.019003 T/LD 0.3584 0.3767
5.019004 0.3050 0.3067
5.019004 T/LD 0.3337 0.3364
5.019005 0.3101 0.3089
5.019005 T/LD 0.3260 0.3226
5.019006 0.3024 0.2972
5.019006 T/LD 0.3447 0.3492
5.019007 0.3019 0.3099
5.019007 T/LD 0.3366 0.3484
5.019008 0.3077 0.3157
5.019008 T/LD 0.3423 0.3611
5.019009 0.3105 0.3127
5.019009 T/LD 0.3325 0.3360
5.019010 0.2914 0.3098
5.019010 T/LD 0.3284 0.3272
5.019011 0.3124 0.3135
5.019011 T/LD 0.3264 0.3320
5.019011 0.3005 0.3148
5.019011 T/LD 0.3332 0.3336
5.020000 0.3105 0.3180
5.020000 T/LD 0.3507 0.3745
5.020001 0.3343 0.3498
5.020001 T/LD 0.3291 0.3299
5.020002 0.3075 0.3077
5.020002 T/LD 0.3453 0.3570
5.020003 0.3121 0.3144
5.020003 T/LD 0.3283 0.3367
5.020003 0.3570 0.3656
5.020003 T/LD 0.3849 0.3863
5.021000 0.3131 0.3128
5.021000 T/LD 0.3283 0.3345
5.021001 0.3177 0.3188
5.021001 T/LD 0.3397 0.3744
5.021002 0.3278 0.3278
5.021002 T/LD 0.3858 0.3978
5.021003 0.3339 0.3490
5.021003 T/LD 0.3462 0.3657
5.021004 0.3342 0.3464
5.021004 T/LD 0.3529 0.3685
5.021005 0.3034 0.3346
5.021005 T/LD 0.3474 0.3593
5.021006 0.3325 0.3476
5.021006 T/LD 0.3306 0.3345
5.021007 0.3793 0.3948
5.021007 T/LD 0.3452 0.3605
5.021008 0.3325 0.3458
5.021008 T/LD 0.3494 0.3676
5.021009 0.3044 0.3280
5.021009 T/LD 0.3482 0.3721
5.021010 0.3184 0.3383
5.021010 T/LD 0.3412 0.3617
5.021011 0.3248 0.3360
5.021011 T/LD 0.3340 0.3447
5.021011 0.3099 0.3290
5.021011 T/LD 0.3417 0.3409
5.022000 0.3349 0.3325
5.022000 T/LD 0.3352 0.3469
5.022001 0.3133 0.3448
5.022001 T/LD 0.3311 0.3327
5.022002 0.3048 0.3066
5.022002 T/LD 0.3597 0.3809
5.022003 0.3291 0.3403
5.022003 T/LD 0.3472 0.3697
5.022004 0.2983 0.2941
5.022004 T/LD 0.3183 0.3223
5.022004 0.3360 0.3523
5.022004 T/LD 0.3204 0.3342
5.023000 0.3025 0.3063
5.023000 T/LD 0.3219 0.3248
5.023001 0.3092 0.3384
5.023001 T/LD 0.3555 0.3784
5.023002 0.3194 0.3347
5.023002 T/LD 0.3465 0.3697
5.023003 0.3223 0.3303
5.023003 T/LD 0.3332 0.3465
5.023004 0.3158 0.3247
5.023004 T/LD 0.3172 0.3192
5.023005 0.3245 0.3316
5.023005 T/LD 0.3098 0.3325
5.023006 0.3254 0.3442
5.023006 T/LD 0.3536 0.3731
5.023007 0.3213 0.3364
5.023007 T/LD 0.3087 0.3266
5.023008 0.3138 0.3249
5.023008 T/LD 0.3400 0.3559
5.023009 0.3058 0.3117
5.023009 T/LD 0.3475 0.3662
5.023009 0.3044 0.3197
5.023009 T/LD 0.3875 0.3910
5.024000 0.3270 0.3449
5.024000 T/LD 0.3586 0.3758
5.024001 0.3280 0.3423
5.024001 T/LD 0.3516 0.3670
5.024002 0.3203 0.3340
5.024002 T/LD 0.3395 0.3577
5.024003 0.3173 0.3302
5.024003 T/LD 0.3291 0.3427
5.024004 0.2994 0.3016
5.024004 T/LD 0.3247 0.3305
5.024004 0.3076 0.3329
5.024004 T/LD 0.3627 0.3628
5.025000 0.3073 0.3071
5.025000 T/LD 0.3338 0.3397
5.025001 0.3044 0.3013
5.025001 T/LD 0.3290 0.3292
5.025002 0.3040 0.3202
5.025002 T/LD 0.3487 0.3731
5.025003 0.3294 0.3435
5.025003 T/LD 0.3277 0.3267
5.025004 0.3178 0.3300
5.025004 T/LD 0.3509 0.3662
5.025005 0.3179 0.3269
5.025005 T/LD 0.3227 0.3222
5.025006 0.3049 0.2994
5.025006 T/LD 0.3217 0.3220
5.025007 0.3037 0.2982
5.025007 T/LD 0.3400 0.3540
5.025008 0.3029 0.3057
5.025008 T/LD 0.3368 0.3532
5.025009 0.3025 0.3067
5.025009 T/LD 0.3246 0.3285
5.025010 0.3196 0.3376
5.025010 T/LD 0.3504 0.3710
5.025011 0.3077 0.3097
5.025011 T/LD 0.3496 0.3664
5.025012 0.3239 0.3379
5.025012 T/LD 0.3457 0.3656
5.025012 0.3051 0.3203
5.025012 T/LD 0.3526 0.3530
5.026000 0.3112 0.3150
5.026000 T/LD 0.3515 0.3657
5.026001 0.3151 0.3287
5.026001 T/LD 0.3372 0.3534
5.026002 0.3053 0.3063
5.026002 T/LD 0.3194 0.3259
5.026003 0.3020 0.3027
5.026003 T/LD 0.3351 0.3547
5.026003 0.3434 0.3440
5.026003 T/LD 0.3363 0.3608
5.027000 0.3229 0.3507
5.027000 T/LD 0.3220 0.3241
5.027001 0.3208 0.3376
5.027001 T/LD 0.3152 0.3171
5.027002 0.3002 0.3012
5.027002 T/LD 0.3458 0.3628
5.027003 0.2984 0.2994
5.027003 T/LD 0.3375 0.3589
5.027004 0.3015 0.3038
5.027004 T/LD 0.3749 0.3944
5.027005 0.3212 0.3332
5.027005 T/LD 0.3408 0.3626
5.027006 0.2891 0.2863
5.027006 T/LD 0.3227 0.3300
5.027007 0.3109 0.3270
5.027007 T/LD 0.3329 0.3535
5.027008 0.2980 0.3085
5.027008 T/LD 0.3309 0.3502
5.027009 0.5743 0.5089
5.027009 T/LD 0.7061 0.6282
5.027010 0.6032 0.5273
5.027010 T/LD 0.6751 0.5967
5.027011 0.6061 0.5270
5.027011 T/LD 0.6762 0.5998
5.027011 0.6982 0.5960
5.027011 T/LD 0.7923 0.6837
5.028000 0.6216 0.5688
5.028000 T/LD 0.7364 0.6733
5.028001 0.6057 0.5595
5.028001 T/LD 0.6718 0.5895
5.028002 T/LD 0.6828 0.6199
5.028002 0.5848 0.5225
5.028002 T/LD 0.6949 0.6343
5.028003 0.6411 0.5823
5.028003 T/LD 0.6687 0.5955
5.028002 T/LD 0.7972 0.7022
5.028003 0.6207 0.5623
5.028003 T/LD 0.7164 0.6419
5.029000 0.5836 0.5248
5.029000 T/LD 0.7156 0.6328
5.029001 T/LD 0.7174 0.6894
5.029001 0.6295 0.5806
5.029001 T/LD 0.7010 0.6332
5.029002 0.5659 0.5239
5.029002 T/LD 0.6680 0.5941
5.029003 0.6273 0.6195
5.029003 T/LD 0.7448 0.6664
5.029004 0.6605 0.5952
5.029004 T/LD 0.7254 0.6594
5.029005 0.6322 0.5793
5.029005 T/LD 0.7265 0.6562
5.029006 0.6243 0.5793
5.029006 T/LD 0.7224 0.6564
5.029007 0.6112 0.5543
5.029007 T/LD 0.7009 0.6279
5.029008 0.6180 0.5547
5.029008 T/LD 0.6795 0.5748
5.029009 0.6293 0.5644
5.029009 T/LD 0.7109 0.6001
5.029010 0.6016 0.5298
5.029010 T/LD 0.6877 0.5771
5.029001 T/LD 0.7611 0.6653
5.029010 0.6604 0.5673
5.029010 T/LD 0.7079 0.6102
5.030000 0.6039 0.5303
5.030000 T/LD 0.7196 0.6386
5.030001 0.6473 0.5792
5.030001 T/LD 0.7475 0.6435
5.030002 0.6319 0.5680
5.030002 T/LD 0.7777 0.6620
5.030003 0.6335 0.5986
5.030003 T/LD 0.6957 0.5950
5.030003 0.6619 0.5743
5.030003 T/LD 0.7833 0.6524
5.031000 0.6028 0.5308
5.031000 T/LD 0.7324 0.6384
5.031001 0.6372 0.5693
5.031001 T/LD 0.7410 0.6440
5.031002 0.6240 0.5724
5.031002 T/LD 0.7005 0.6459
5.031003 0.6446 0.5670
5.031003 T/LD 0.7641 0.6813
5.031004 0.6530 0.6000
5.031004 T/LD 0.7756 0.6779
5.031005 0.6617 0.5993
5.031005 T/LD 0.6919 0.6390
5.031006 0.6421 0.5751
5.031006 T/LD 0.7545 0.6402
5.031007 0.6436 0.5682
5.031007 T/LD 0.6846 0.5909
5.031008 0.6079 0.5383
5.031008 T/LD 0.7767 0.6764
5.031009 0.5813 0.5204
5.031009 T/LD 0.7699 0.6772
5.031010 0.9169 0.8621
5.031010 T/LD 1.0245 0.9286
5.031011 0.8421 0.7805
5.031011 T/LD 0.9652 0.8779
5.031011 0.9216 0.8346
5.031011 T/LD 1.0618 0.9316
5.032000 0.8218 0.7559
5.032000 T/LD 1.0617 0.9685
5.032001 0.8014 0.7308
5.032001 T/LD 0.9520 0.8584
5.032000 0.8442 0.7603
5.032000 T/LD 0.9883 0.8668
5.033000 0.9079 0.8521
5.033000 T/LD 1.0377 0.9343
5.033001 0.8726 0.8176
5.033001 T/LD 0.9939 0.8973
5.033002 0.8699 0.8165
5.033002 T/LD 1.0481 0.9620
5.033003 0.8752 0.8077
5.033003 T/LD 1.0017 0.9719
5.033004 0.9128 0.8610
5.033004 T/LD 0.9891 0.9124
5.033005 0.8871 0.8151
5.033005 T/LD 1.0118 0.9246
5.033006 0.8262 0.7608
5.033006 T/LD 1.0464 0.9584
5.033007 0.6545 0.5957
5.033007 T/LD 0.7216 0.6388
5.033008 0.6522 0.5961
5.033008 T/LD 0.7688 0.6797
5.033009 0.5769 0.5172
5.033009 T/LD 0.7423 0.6453
5.033003 0.8474 0.7628
5.033003 T/LD 1.1179 0.9788
5.034000 0.6495 0.5931
5.034000 T/LD 0.7665 0.6811
5.034001 0.6268 0.5677
5.034001 T/LD 0.7078 0.6371
5.035001 0.6494 0.5961
5.035001 T/LD 0.7697 0.6769
5.035002 0.6652 0.6075
5.035002 T/LD 0.6983 0.5894
5.035003 0.5930 0.5122
5.035003 T/LD 0.7005 0.5869
5.035004 0.6426 0.5654
5.035004 T/LD 0.7060 0.6003
5.035005 0.6039 0.5350
5.035005 T/LD 0.6984 0.5975
5.035006 0.6273 0.5586
5.035006 T/LD 0.7287 0.6301
5.035007 0.6539 0.5932
5.035007 T/LD 0.6739 0.5658
5.035008 0.6121 0.5386
5.035008 T/LD 0.6882 0.5971
5.035009 0.6403 0.5684
5.035009 T/LD 0.6641 0.5679
5.035010 0.6270 0.5558
5.035010 T/LD 0.6413 0.5537
5.035011 0.6173 0.5717
5.035011 T/LD 0.7013 0.6155
5.036000 0.6252 0.5569
5.036000 T/LD 0.6772 0.5808
5.036001 0.6271 0.5580
5.036001 T/LD 0.6856 0.5762
5.037000 0.6295 0.5567
5.037000 T/LD 0.6839 0.5772
5.037001 0.6370 0.5648
5.037001 T/LD 0.6443 0.5318
5.037002 0.6313 0.5605
5.037002 T/LD 0.6761 0.5839
5.037003 0.6118 0.5229
5.037003 T/LD 0.6872 0.5851
5.037004 0.5927 0.5190
5.037004 T/LD 0.6835 0.5835
5.037005 0.5893 0.5170
5.037005 T/LD 0.7010 0.6101
5.037006 0.6673 0.6062
5.037006 T/LD 0.6956 0.5848
5.037007 0.5831 0.5221
5.037007 T/LD 0.7149 0.6237
5.037008 0.6346 0.5700
5.037008 T/LD 0.6823 0.5932
5.037009 0.6229 0.5515
5.037009 T/LD 0.6828 0.5836
5.037010 0.6727 0.6029
5.037010 T/LD 0.6713 0.5880
5.037011 0.6183 0.5516
5.037011 T/LD 0.7146 0.6174
5.038000 0.6349 0.5780
5.038000 T/LD 0.6918 0.5836
5.038000 T 0.6069 0.5219
5.038000 T/QM 0.8000 0.6709
On Thu, 10 Aug 2023, 10:28 xenu, @.***> wrote:
I played with Windows Performance Analyzer a bit and it seems that most of the time is spent inside Perl_pp_multiconcat (which was introduced in Perl 5.28):
[image: wpa] https://user-images.githubusercontent.com/352038/259676065-ca48ea0d-d082-48f6-871b-5b0cde7a7840.png
More specifically, it seems most of the time is spent on reallocations inside Perl_tmps_grow_p. And indeed, the following (crude) patch makes the script execute ~5x faster on my machine:
diff --git a/scope.c b/scope.c index 28c767f128..4af8c4ffe2 100644--- a/scope.c+++ b/scope.c@@ -231,8 +231,8 @@ Perl_tmps_growp(pTHX SSize_t ix) { SSize_t extend_to = ix;
ifndef STRESS_REALLOC- if (ix - PL_tmps_max < 128)- extend_to += (PL_tmps_max < 512) ? 128 : 512;+ if (ix - PL_tmps_max < 10000)+ extend_to += 10000;
endif
Renew(PL_tmps_stack, extend_to + 1, SV*); PL_tmps_max = extend_to + 1;
I suspect the difference between Windows and Linux is caused by the fact that Windows has a slower realloc(). On Windows it's implemented naively (malloc + memcpy + free), while on Linux it avoids copies thanks to the mremap syscall
Yep. This has come up many a time in the internals.
Yves
I'm just trying to understand what's going on here and doing a brain dump.
The SVs (with the a
's in them) being processed by Perl_pp_multiconcat look like this:
SV = PVMG(0x559a66cfb750) at 0x559a66d0ae00
REFCNT = 1
FLAGS = (GMG,SMG,POK,pPOK)
IV = 0
NV = 0
PV = 0x559a66d14d40 "a"\0
CUR = 1
LEN = 16
MAGIC = 0x559a66d14df0
MG_VIRTUAL = &PL_vtbl_sv
MG_TYPE = PERL_MAGIC_sv(\0)
MG_OBJ = 0x559a66d0ade8
Stepping through in blead, because the GMG flag is set, this early branch is taken in "Phase 1" of pp_multiconcat
(https://github.com/Perl/perl5/blob/blead/pp_hot.c#L620):
else if (UNLIKELY(SvFLAGS(sv) & (SVs_GMG|SVf_ROK))) {
goto do_magical;
This takes us to "Phase 7", where for every a
, a mortal SV is created via the bottom branch here (https://github.com/Perl/perl5/blob/blead/pp_hot.c#L1113):
/* get the next arg SV or regen the next const SV */
len = lens[i >> 1].ssize;
if (i == n) {
/* handle the final targ .= (....) */
right = left;
left = targ;
}
else if (i & 1)
right = svp[(i >> 1)];
else if (len < 0)
continue; /* no const in this position */
else {
right = newSVpvn_flags(cpv, len, (utf8 | SVs_TEMP));
cpv += len;
}
and another one shortly thereafter in this branch (https://github.com/Perl/perl5/blob/blead/pp_hot.c#L1128):
if (arg_count == 2 && i < n) {
/* for the first concat, create a mortal acting like the
* padtmp from OP_CONST. In later iterations this will
* be appended to */
nexttarg = sv_newmortal();
nextappend = FALSE;
}
In comparison, when the multiconcat optimization is disabled in the peephole optimizer to allow concat
to remain in the optree, all these mortal SVs are not created.
--- a/peep.c
+++ b/peep.c
@@ -295,7 +295,7 @@ S_maybe_multiconcat(pTHX_ OP *o)
/* first see if, at the top of the tree, there is an assign,
* append and/or stringify */
-
+return;
if (topop->op_type == OP_SASSIGN) {
/* expr = ..... */
if (o->op_ppaddr != PL_ppaddr[OP_SASSIGN])
concat
only considers the magic of its operands in tryAMAGICbin_MG(concat_amg, AMGf_assign);
which - at a really quick glance - doesn't seem to end up doing anything significant.
So I guess pp_multiconcat
is being prematurely pessimistic and it might be possible for it to look at exactly what magic method is set before jumping to do_magical
?
(or maybe still jump to Phase 7, but do things a bit differently there to avoid creating all the mortal SVs)
In the test case from above, pp_multiconcat
's call to sv_newmortal
is to get a mortal SV to hold the final concatenated string. Since the op's op_targ
points to a (presumable its own) PADTMP, it seems like we could just use that instead?
diff --git a/pp_hot.c b/pp_hot.c
index 3ec24a0e23..077aa248e2 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -1122,10 +1122,15 @@ PP(pp_multiconcat)
}
if (arg_count == 2 && i < n) {
- /* for the first concat, create a mortal acting like the
- * padtmp from OP_CONST. In later iterations this will
- * be appended to */
- nexttarg = sv_newmortal();
+ if (SvPADTMP(targ)) { /* Reuse current pad SV */
+ SvPVCLEAR(targ);
+ nexttarg = targ;
+ } else {
+ /* for the first concat, create a mortal acting like the
+ * padtmp from OP_CONST. In later iterations this will
+ * be appended to */
+ nexttarg = sv_newmortal();
+ }
nextappend = FALSE;
}
else {
(make test
doesn't fail with that on a non-debugging build. Doing a debugging build now.)
I don't know if the interpreter could be in a situation where it could create a load of mortals and op_targ
not point to a PADTMP... If this could happen, perhaps the peephole optimizer could allocate an extra pad slot for the multiconcat op, which it can (re)use for this purpose.
The newSVpvn_flags(cpv, len, (utf8 | SVs_TEMP))
is necessary because there is no persistent SV containing the '-' character, it's just something in a char buffer attached to the multiconcat op. I'm not sure what the best approach for reducing mortals here is. Possibilities:
cpv
buffer. Instead, the peephole optimizer records their SV* and pp_multiconcat grabs the value directly out of the CONST ops when it needs it. (This option seems messier and might reduce performance for concats with all non-magical SVs.)
There could easily be a better option that I haven't thought of....I'm not sure where extra pad index(es) could be stored - probably in the aux buf?
- The optimization is changed so that CONST ops aren't pulled into the
cpv
buffer. Instead, the peephole optimizer records their SV* and pp_multiconcat grabs the value directly out of the CONST ops when it needs it. (This option seems messier and might reduce performance for concats with all non-magical SVs.)
Duh. It would probably be better to completely swipe the SVs and free the CONST ops.
- The peephole optimizer could allocate an extra pad slot for the multiconcat op, which it can (re)use for this purpose.
My naive attempt at trying this has fallen foul of this test from t/op/concat2.t:
package RT132385 {
my @a;
use overload '.' => sub { push @a, \$_[1]; $_[0] };
my $o = bless [];
my $x = $o . "A" . $o . 'B';
::is "${$a[0]}${$a[2]}", "AB", "RT #132385";
}
- The peephole optimizer could allocate an extra pad slot for the multiconcat op, which it can (re)use for this purpose.
My naive attempt at trying this has fallen foul of this test from t/op/concat2.t:
So I think the alternative to blowing out the tmps stack might be for the op to maintain it's own tmps stack just for the duration of the op execution, and free the tmps on it just before the op completes?
On Sun, Aug 13, 2023 at 02:48:09PM -0700, Richard Leach wrote:
- The peephole optimizer could allocate an extra pad slot for the multiconcat op, which it can (re)use for this purpose.
My naive attempt at trying this has fallen foul of this test from t/op/concat2.t:
So I think the alternative to blowing out the tmps stack might be for the op to maintain it's own tmps stack just for the duration of the op execution, and free the tmps on it just before the op completes?
pp_multiconcat() is my baby. There are lots of subtleties and complications, such as op_targ not always being a PADTMP.
I'll look into this when I get some time.
-- The optimist believes that he lives in the best of all possible worlds. As does the pessimist.
pp_multiconcat() is my baby. There are lots of subtleties and complications, such as op_targ not always being a PADTMP. I'll look into this when I get some time.
Cool. No probs.
Fixed by v5.39.2-48-g7bcd18ae82
This was originally reported on PerlMonks: https://www.perlmonks.org/index.pl?node_id=11153747
A performance regression introduced in Perl 5.28 causes this script to execute ~20x-40x slower on Windows:
Here are the results:
I can confirm the issue is still present in blead.