dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.29k stars 4.74k forks source link

Perf: double.ToString(CultureInfo) 7x slower on Linux #7700

Closed tarekgh closed 4 years ago

tarekgh commented 7 years ago

@ianhays commented on Tue Oct 13 2015

test name date ran Windows time Linux time linux/windows
System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "fr") 10/13/2015 3.780472576 29.0264 7.677981896
System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "da") 10/13/2015 3.757379192 28.05800001 7.467439025
System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "ja") 10/13/2015 3.773344988 27.9326 7.402609643
System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "") 10/13/2015 3.806987203 27.96 7.34439033

Perf Test:

[Benchmark]
[InlineData("fr")]
[InlineData("da")]
[InlineData("ja")]
[InlineData("")]
public void ToString(string culturestring)
{
    double number = 104234.343;
    CultureInfo cultureInfo = new CultureInfo(culturestring);
    foreach (var iteration in Benchmark.Iterations)
        using (iteration.StartMeasurement())
        {
            for (int i = 0; i < innerIterations; i++)
            {
                number.ToString(cultureInfo); number.ToString(cultureInfo); number.ToString(cultureInfo);
                number.ToString(cultureInfo); number.ToString(cultureInfo); number.ToString(cultureInfo);
                number.ToString(cultureInfo); number.ToString(cultureInfo); number.ToString(cultureInfo);
            }
        }
}

Although it's an admittedly poor comparative data point, the ToString(CultureInfo) method for DateTime was 1:1 between Linux:Windows so not all ToString methods are equally slow when using CultureInfo. As such, I am unsure of whom to to notify; @steveharter @stephentoub?

I have not yet added equivalent ToString tests to other classes, but can if desired for more data points.


@steveharter commented on Mon Nov 16 2015

@ianhays is 7x on windows with coreclr, or desktop clr? In a similar test, I'm getting: coreclr+windows 4.3x faster than coreclr+linux desktopclr+windows 18x-25x faster than coreclr+linux This includes using a simple cpu-only baseline looping test used to normalize the results due to different cpu speeds and\or differences in clr jitting\etc.


@ianhays commented on Mon Dec 26 2016

It's interesting that you're seeing such a big difference between Windows-CoreCLR and Windows-Desktop. For my tests they were nearly 1:1. Could you post some numerical results and we can compare data?

I'm running on Ubuntu 14.04 (running in HyperV), CentOS 7.1 (running in HyperV), and Windows 10 running on coreclr. Same computer. Exact test listed above. Using binaries from \cpvsbuild\Drops\dev14\ProjectK\raw\23513.00\binaries.amd64ret.

CentOS results:

<collection total="4" passed="4" failed="0" skipped="0" name="Test collection for System.Globalization.Tests.Perf_NumberCultureInfo" time="4.209">
  <test name="System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: &quot;fr&quot;)" type="System.Globalization.Tests.Perf_NumberCultureInfo" method="ToString" time="1.0664355" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Globalization.Tests.dll-CentOS" etl="/home/ianha/rel/results/System.Globalization.Tests.dll-CentOS.csv">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="29.394985914230347" />
        <iteration index="1" Duration="27.538418054580688" />
        <iteration index="2" Duration="28.897694945335388" />
        <iteration index="3" Duration="33.568613886833191" />
        <iteration index="4" Duration="28.437203049659729" />
        <iteration index="5" Duration="27.611016988754272" />
        <iteration index="6" Duration="29.217790007591248" />
        <iteration index="7" Duration="28.009710907936096" />
        <iteration index="8" Duration="28.690199017524719" />
        <iteration index="9" Duration="31.641247987747192" />
        <iteration index="10" Duration="29.398486018180847" />
        <iteration index="11" Duration="27.699916005134583" />
        <iteration index="12" Duration="27.58421802520752" />
        <iteration index="13" Duration="27.673416018486023" />
        <iteration index="14" Duration="28.14900803565979" />
        <iteration index="15" Duration="27.921112060546875" />
        <iteration index="16" Duration="27.766514897346497" />
        <iteration index="17" Duration="27.707115888595581" />
        <iteration index="18" Duration="28.539501070976257" />
        <iteration index="19" Duration="27.511018991470337" />
        <iteration index="20" Duration="27.805315017700195" />
        <iteration index="21" Duration="27.854113936424255" />
        <iteration index="22" Duration="27.897012948989868" />
        <iteration index="23" Duration="31.014458060264587" />
        <iteration index="24" Duration="28.17520797252655" />
        <iteration index="25" Duration="28.119809031486511" />
        <iteration index="26" Duration="28.370004057884216" />
        <iteration index="27" Duration="27.953910946846008" />
        <iteration index="28" Duration="28.936293959617615" />
        <iteration index="29" Duration="29.012992978096008" />
        <iteration index="30" Duration="28.124009013175964" />
        <iteration index="31" Duration="27.519919991493225" />
        <iteration index="32" Duration="27.68511700630188" />
        <iteration index="33" Duration="27.905812978744507" />
        <iteration index="34" Duration="27.937612056732178" />
        <iteration index="35" Duration="27.705316066741943" />
      </iterations>
    </performance>
  </test>
  <test name="System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: &quot;da&quot;)" type="System.Globalization.Tests.Perf_NumberCultureInfo" method="ToString" time="1.0456405" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Globalization.Tests.dll-CentOS" etl="/home/ianha/rel/results/System.Globalization.Tests.dll-CentOS.csv">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="36.094968914985657" />
        <iteration index="1" Duration="27.743814945220947" />
        <iteration index="2" Duration="27.875913023948669" />
        <iteration index="3" Duration="27.602617979049683" />
        <iteration index="4" Duration="27.816213965415955" />
        <iteration index="5" Duration="28.792296886444092" />
        <iteration index="6" Duration="27.533019065856934" />
        <iteration index="7" Duration="28.085710048675537" />
        <iteration index="8" Duration="28.151907920837402" />
        <iteration index="9" Duration="27.482020020484924" />
        <iteration index="10" Duration="27.996210932731628" />
        <iteration index="11" Duration="28.618600010871887" />
        <iteration index="12" Duration="27.751914978027344" />
        <iteration index="13" Duration="27.638016939163208" />
        <iteration index="14" Duration="27.872413039207458" />
        <iteration index="15" Duration="30.500468015670776" />
        <iteration index="16" Duration="27.872312903404236" />
        <iteration index="17" Duration="27.939612030982971" />
        <iteration index="18" Duration="27.825913071632385" />
        <iteration index="19" Duration="27.495819926261902" />
        <iteration index="20" Duration="27.704716086387634" />
        <iteration index="21" Duration="27.9049129486084" />
        <iteration index="22" Duration="27.931511998176575" />
        <iteration index="23" Duration="27.667815923690796" />
        <iteration index="24" Duration="27.3945209980011" />
        <iteration index="25" Duration="27.917912006378174" />
        <iteration index="26" Duration="27.716416001319885" />
        <iteration index="27" Duration="27.933212041854858" />
        <iteration index="28" Duration="27.486520051956177" />
        <iteration index="29" Duration="29.66008198261261" />
        <iteration index="30" Duration="29.410285949707031" />
        <iteration index="31" Duration="28.070809960365295" />
        <iteration index="32" Duration="27.493919968605042" />
        <iteration index="33" Duration="28.041810035705566" />
        <iteration index="34" Duration="28.174207925796509" />
        <iteration index="35" Duration="27.583817958831787" />
        <iteration index="36" Duration="27.807613968849182" />
      </iterations>
    </performance>
  </test>
  <test name="System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: &quot;ja&quot;)" type="System.Globalization.Tests.Perf_NumberCultureInfo" method="ToString" time="1.0497453" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Globalization.Tests.dll-CentOS" etl="/home/ianha/rel/results/System.Globalization.Tests.dll-CentOS.csv">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="28.615100026130676" />
        <iteration index="1" Duration="27.676715970039368" />
        <iteration index="2" Duration="27.601817011833191" />
        <iteration index="3" Duration="27.763615012168884" />
        <iteration index="4" Duration="28.683698058128357" />
        <iteration index="5" Duration="27.865113019943237" />
        <iteration index="6" Duration="27.877612948417664" />
        <iteration index="7" Duration="27.683015942573547" />
        <iteration index="8" Duration="28.065410017967224" />
        <iteration index="9" Duration="27.664615988731384" />
        <iteration index="10" Duration="36.985154032707214" />
        <iteration index="11" Duration="30.639065027236938" />
        <iteration index="12" Duration="27.659315943717957" />
        <iteration index="13" Duration="27.68911600112915" />
        <iteration index="14" Duration="27.640017032623291" />
        <iteration index="15" Duration="27.554818987846375" />
        <iteration index="16" Duration="27.4220210313797" />
        <iteration index="17" Duration="27.686416029930115" />
        <iteration index="18" Duration="27.96251106262207" />
        <iteration index="19" Duration="27.818414092063904" />
        <iteration index="20" Duration="28.181107044219971" />
        <iteration index="21" Duration="30.207971930503845" />
        <iteration index="22" Duration="27.66921591758728" />
        <iteration index="23" Duration="27.575117945671082" />
        <iteration index="24" Duration="27.539119005203247" />
        <iteration index="25" Duration="27.812513947486877" />
        <iteration index="26" Duration="28.019811034202576" />
        <iteration index="27" Duration="27.516318917274475" />
        <iteration index="28" Duration="27.816313982009888" />
        <iteration index="29" Duration="27.987210988998413" />
        <iteration index="30" Duration="28.7555969953537" />
        <iteration index="31" Duration="27.774914979934692" />
        <iteration index="32" Duration="28.356604933738708" />
        <iteration index="33" Duration="28.666599035263062" />
        <iteration index="34" Duration="27.587418079376221" />
        <iteration index="35" Duration="30.257572054862976" />
        <iteration index="36" Duration="28.29650604724884" />
      </iterations>
    </performance>
  </test>
  <test name="System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: &quot;&quot;)" type="System.Globalization.Tests.Perf_NumberCultureInfo" method="ToString" time="1.0471872" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Globalization.Tests.dll-CentOS" etl="/home/ianha/rel/results/System.Globalization.Tests.dll-CentOS.csv">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="28.685899019241333" />
        <iteration index="1" Duration="27.759415030479431" />
        <iteration index="2" Duration="28.962594032287598" />
        <iteration index="3" Duration="27.831114053726196" />
        <iteration index="4" Duration="27.918012022972107" />
        <iteration index="5" Duration="27.727115988731384" />
        <iteration index="6" Duration="27.582617998123169" />
        <iteration index="7" Duration="28.238106966018677" />
        <iteration index="8" Duration="27.707015991210938" />
        <iteration index="9" Duration="27.508718967437744" />
        <iteration index="10" Duration="27.685816049575806" />
        <iteration index="11" Duration="27.990810990333557" />
        <iteration index="12" Duration="36.150267958641052" />
        <iteration index="13" Duration="27.81671404838562" />
        <iteration index="14" Duration="27.862313032150269" />
        <iteration index="15" Duration="28.303506016731262" />
        <iteration index="16" Duration="27.79661500453949" />
        <iteration index="17" Duration="28.096608996391296" />
        <iteration index="18" Duration="27.796614050865173" />
        <iteration index="19" Duration="28.159507989883423" />
        <iteration index="20" Duration="28.012010931968689" />
        <iteration index="21" Duration="27.406520962715149" />
        <iteration index="22" Duration="28.078809022903442" />
        <iteration index="23" Duration="27.770314931869507" />
        <iteration index="24" Duration="27.843314051628113" />
        <iteration index="25" Duration="27.394421100616455" />
        <iteration index="26" Duration="29.00989294052124" />
        <iteration index="27" Duration="30.44566810131073" />
        <iteration index="28" Duration="28.251506090164185" />
        <iteration index="29" Duration="28.132707953453064" />
        <iteration index="30" Duration="28.735197067260742" />
        <iteration index="31" Duration="28.501802086830139" />
        <iteration index="32" Duration="27.959811925888062" />
        <iteration index="33" Duration="27.853813886642456" />
        <iteration index="34" Duration="27.863412976264954" />
        <iteration index="35" Duration="27.652817010879517" />
        <iteration index="36" Duration="27.678416967391968" />
      </iterations>
    </performance>
  </test>
</collection>

Windows 10 on CoreCLR results:

<collection total="4" passed="4" failed="0" skipped="0" name="Test collection for System.Globalization.Tests.Perf_NumberCultureInfo" time="4.087">
  <test name="System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: &quot;fr&quot;)" type="System.Globalization.Tests.Perf_NumberCultureInfo" method="ToString" time="1.075388" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Globalization.Tests.dll-WindowsCore" etl="D:\tools\testing\relresults\11-16-2015\System.Globalization.Tests.dll-WindowsCore.etl">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="2.1205962986784925" />
        <iteration index="1" Duration="1.544973291548672" />
        <iteration index="2" Duration="1.4910888198560315" />
        <iteration index="3" Duration="1.5059141771470195" />
        <iteration index="4" Duration="1.6136831205324143" />
        <iteration index="5" Duration="1.5338542735803458" />
        <iteration index="6" Duration="1.5224501525871119" />
        <iteration index="7" Duration="1.496505777327684" />
        <iteration index="8" Duration="1.5469690127224567" />
        <iteration index="9" Duration="1.6091214721351434" />
        <iteration index="10" Duration="1.6191000780041804" />
        <iteration index="11" Duration="1.5027780438739455" />
        <iteration index="12" Duration="1.9124710905532538" />
        <iteration index="13" Duration="1.4910888198559178" />
        <iteration index="14" Duration="1.4896633047318346" />
        <iteration index="15" Duration="1.5284373161084659" />
        <iteration index="16" Duration="1.5355648917292228" />
        <iteration index="17" Duration="1.582606890826014" />
        <iteration index="18" Duration="1.4899484077566285" />
        <iteration index="19" Duration="1.4959355712779825" />
        <iteration index="20" Duration="1.5515306611197275" />
        <iteration index="21" Duration="1.5823217878012201" />
        <iteration index="22" Duration="1.4919441289304132" />
        <iteration index="23" Duration="1.6273680657242267" />
        <iteration index="24" Duration="1.496505777327684" />
        <iteration index="25" Duration="1.6883801130376241" />
        <iteration index="26" Duration="1.4970759833772718" />
        <iteration index="27" Duration="1.4885228926324316" />
        <iteration index="28" Duration="1.5184587102395426" />
        <iteration index="29" Duration="1.6456146593133099" />
        <iteration index="30" Duration="1.5187438132643365" />
        <iteration index="31" Duration="1.5190289162891304" />
        <iteration index="32" Duration="1.4899484077566285" />
        <iteration index="33" Duration="1.5090503104200934" />
        <iteration index="34" Duration="1.4967908803524779" />
        <iteration index="35" Duration="1.5107609285691979" />
        <iteration index="36" Duration="1.5928705997198449" />
        <iteration index="37" Duration="1.5743389031059678" />
        <iteration index="38" Duration="1.6444742472139069" />
        <iteration index="39" Duration="1.5680666365597062" />
        <iteration index="40" Duration="1.611402296333722" />
        <iteration index="41" Duration="1.5652156063114262" />
        <iteration index="42" Duration="1.5800409636026416" />
        <iteration index="43" Duration="1.5036333529484409" />
        <iteration index="44" Duration="1.49336964405461" />
        <iteration index="45" Duration="1.4950802622036008" />
        <iteration index="46" Duration="1.5247309767856905" />
        <iteration index="47" Duration="1.489378201706927" />
        <iteration index="48" Duration="1.5746240061308754" />
        <iteration index="49" Duration="1.49622067430289" />
        <iteration index="50" Duration="1.4967908803525916" />
        <iteration index="51" Duration="1.5811813757019308" />
        <iteration index="52" Duration="1.5501051459955306" />
        <iteration index="53" Duration="1.5047737650477302" />
        <iteration index="54" Duration="1.5401265401266073" />
        <iteration index="55" Duration="1.5412669522258966" />
        <iteration index="56" Duration="1.4902335107814224" />
        <iteration index="57" Duration="1.4890930986822468" />
        <iteration index="58" Duration="1.4839612442351608" />
        <iteration index="59" Duration="1.5672113274853245" />
        <iteration index="60" Duration="1.4896633047318346" />
        <iteration index="61" Duration="1.6245170354759466" />
        <iteration index="62" Duration="1.5552370004425029" />
        <iteration index="63" Duration="1.5164629890657579" />
        <iteration index="64" Duration="1.6176745628799836" />
        <iteration index="65" Duration="1.5663560184108292" />
        <iteration index="66" Duration="1.513326855792684" />
        <iteration index="67" Duration="1.6843886706900548" />
        <iteration index="68" Duration="1.8161062681610929" />
        <iteration index="69" Duration="1.5187438132643365" />
        <iteration index="70" Duration="1.5660709153859216" />
        <iteration index="71" Duration="1.4927994380049086" />
        <iteration index="72" Duration="1.5392712310519983" />
        <iteration index="73" Duration="1.5635049881625491" />
        <iteration index="74" Duration="1.5463988066728689" />
        <iteration index="75" Duration="1.5161778860408504" />
        <iteration index="76" Duration="1.4930845410297025" />
        <iteration index="77" Duration="1.5840324059502109" />
        <iteration index="78" Duration="1.4899484077566285" />
        <iteration index="79" Duration="1.5153225769664687" />
        <iteration index="80" Duration="1.6145384296069096" />
        <iteration index="81" Duration="1.5769048303295676" />
        <iteration index="82" Duration="1.5161778860408504" />
        <iteration index="83" Duration="1.5119013406684871" />
        <iteration index="84" Duration="1.5592284427900722" />
        <iteration index="85" Duration="1.4879526865827302" />
        <iteration index="86" Duration="1.5908748785460602" />
        <iteration index="87" Duration="1.5081950013457117" />
        <iteration index="88" Duration="1.6784015071685872" />
        <iteration index="89" Duration="1.8466122918176779" />
        <iteration index="90" Duration="1.7445454089289569" />
        <iteration index="91" Duration="1.564930503286746" />
        <iteration index="92" Duration="1.7029203673039319" />
        <iteration index="93" Duration="1.5272969040092903" />
        <iteration index="94" Duration="1.5264415949347949" />
        <iteration index="95" Duration="1.5663560184108292" />
        <iteration index="96" Duration="1.5592284427900722" />
        <iteration index="97" Duration="1.525301182835392" />
        <iteration index="98" Duration="1.8417655403957269" />
        <iteration index="99" Duration="1.5506753520451184" />
        <iteration index="100" Duration="1.523305461661721" />
      </iterations>
    </performance>
  </test>
  <test name="System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: &quot;da&quot;)" type="System.Globalization.Tests.Perf_NumberCultureInfo" method="ToString" time="1.0043532" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Globalization.Tests.dll-WindowsCore" etl="D:\tools\testing\relresults\11-16-2015\System.Globalization.Tests.dll-WindowsCore.etl">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="1.8386294071224256" />
        <iteration index="1" Duration="1.6105469872593403" />
        <iteration index="2" Duration="1.6895205251369134" />
        <iteration index="3" Duration="1.5780452424287432" />
        <iteration index="4" Duration="1.4999270136256655" />
        <iteration index="5" Duration="1.4959355712778688" />
        <iteration index="6" Duration="1.4985014985013549" />
        <iteration index="7" Duration="1.708622427800492" />
        <iteration index="8" Duration="1.5258713888852071" />
        <iteration index="9" Duration="1.4979312924517671" />
        <iteration index="10" Duration="1.5375606129030075" />
        <iteration index="11" Duration="1.5429775703748874" />
        <iteration index="12" Duration="1.5999981753407155" />
        <iteration index="13" Duration="1.7941533352493479" />
        <iteration index="14" Duration="1.4959355712780962" />
        <iteration index="15" Duration="1.6005683813903033" />
        <iteration index="16" Duration="1.4950802622036008" />
        <iteration index="17" Duration="1.6176745628799836" />
        <iteration index="18" Duration="1.6279382717739281" />
        <iteration index="19" Duration="1.5831770968757155" />
        <iteration index="20" Duration="1.5150374739416748" />
        <iteration index="21" Duration="1.4927994380050222" />
        <iteration index="22" Duration="1.7225924760170983" />
        <iteration index="23" Duration="1.5463988066728689" />
        <iteration index="24" Duration="1.4922292319552071" />
        <iteration index="25" Duration="1.5418371582754844" />
        <iteration index="26" Duration="1.6863843918638395" />
        <iteration index="27" Duration="1.6866694948887471" />
        <iteration index="28" Duration="1.5093354134451147" />
        <iteration index="29" Duration="1.6019938965143865" />
        <iteration index="30" Duration="1.5332840675305306" />
        <iteration index="31" Duration="1.6079810600358542" />
        <iteration index="32" Duration="1.8936542909145828" />
        <iteration index="33" Duration="1.5150374739414474" />
        <iteration index="34" Duration="1.7682089599898063" />
        <iteration index="35" Duration="1.8825352729461429" />
        <iteration index="36" Duration="1.6347807443696638" />
        <iteration index="37" Duration="1.5888791573722756" />
        <iteration index="38" Duration="1.6230915203518634" />
        <iteration index="39" Duration="1.6667122831506731" />
        <iteration index="40" Duration="1.4987866015262625" />
        <iteration index="41" Duration="1.6812525374170946" />
        <iteration index="42" Duration="1.5592284427900722" />
        <iteration index="43" Duration="1.5680666365597062" />
        <iteration index="44" Duration="1.8534547644137547" />
        <iteration index="45" Duration="1.7160351064460428" />
        <iteration index="46" Duration="1.7049160884776029" />
        <iteration index="47" Duration="1.9942956586792207" />
        <iteration index="48" Duration="1.5170331951155731" />
        <iteration index="49" Duration="1.7921576140754496" />
        <iteration index="50" Duration="1.608551266085442" />
        <iteration index="51" Duration="1.5689219456342016" />
        <iteration index="52" Duration="1.6219511082524605" />
        <iteration index="53" Duration="1.6099767812097525" />
        <iteration index="54" Duration="2.0721287844576182" />
        <iteration index="55" Duration="1.8158211651361853" />
        <iteration index="56" Duration="1.5566625155668135" />
        <iteration index="57" Duration="1.5042035589981424" />
        <iteration index="58" Duration="1.5406967461760814" />
        <iteration index="59" Duration="1.7171755185452184" />
        <iteration index="60" Duration="1.6071257509613588" />
        <iteration index="61" Duration="1.6002832783653957" />
        <iteration index="62" Duration="1.532143655431355" />
        <iteration index="63" Duration="1.6516018228346638" />
        <iteration index="64" Duration="2.1553788677076682" />
        <iteration index="65" Duration="1.7268690213895752" />
        <iteration index="66" Duration="1.6550230591326454" />
        <iteration index="67" Duration="1.5190289162892441" />
        <iteration index="68" Duration="1.5942961148441555" />
        <iteration index="69" Duration="1.5837473029253033" />
        <iteration index="70" Duration="1.8617227521335735" />
        <iteration index="71" Duration="1.68780990698815" />
        <iteration index="72" Duration="1.7074820157013164" />
        <iteration index="73" Duration="1.8349230677997639" />
        <iteration index="74" Duration="1.6838184646403533" />
        <iteration index="75" Duration="1.6661420771010853" />
        <iteration index="76" Duration="1.7034905733537471" />
        <iteration index="77" Duration="1.6695633133988395" />
        <iteration index="78" Duration="1.8697056368289395" />
        <iteration index="79" Duration="1.6803972283423718" />
        <iteration index="80" Duration="1.6977885128569596" />
        <iteration index="81" Duration="1.6652867680263626" />
        <iteration index="82" Duration="1.7111883550239781" />
        <iteration index="83" Duration="1.9603683987245404" />
        <iteration index="84" Duration="1.6732696527217286" />
        <iteration index="85" Duration="1.6752653738956269" />
        <iteration index="86" Duration="1.6635761498775992" />
        <iteration index="87" Duration="1.6983587189067748" />
        <iteration index="88" Duration="1.6709888285229226" />
        <iteration index="89" Duration="1.6912311432859042" />
        <iteration index="90" Duration="1.8292210073032038" />
        <iteration index="91" Duration="1.6735547557464088" />
        <iteration index="92" Duration="1.8876671273933425" />
        <iteration index="93" Duration="1.6071257509613588" />
        <iteration index="94" Duration="1.5734835940315861" />
        <iteration index="95" Duration="1.7231626820669135" />
        <iteration index="96" Duration="1.7476815422021446" />
        <iteration index="97" Duration="1.5626496790880537" />
        <iteration index="98" Duration="1.5706325637834198" />
        <iteration index="99" Duration="1.5863132301487894" />
        <iteration index="100" Duration="1.5894493634218634" />
      </iterations>
    </performance>
  </test>
  <test name="System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: &quot;ja&quot;)" type="System.Globalization.Tests.Perf_NumberCultureInfo" method="ToString" time="1.0038747" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Globalization.Tests.dll-WindowsCore" etl="D:\tools\testing\relresults\11-16-2015\System.Globalization.Tests.dll-WindowsCore.etl">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="1.6245170354759466" />
        <iteration index="1" Duration="1.5646454002617247" />
        <iteration index="2" Duration="1.5215948435129576" />
        <iteration index="3" Duration="1.5683517395846138" />
        <iteration index="4" Duration="1.5281522130840131" />
        <iteration index="5" Duration="1.5729133879822257" />
        <iteration index="6" Duration="1.6721292406223256" />
        <iteration index="7" Duration="1.6216660052277803" />
        <iteration index="8" Duration="1.6872397009383349" />
        <iteration index="9" Duration="1.6838184646403533" />
        <iteration index="10" Duration="1.7280094334892055" />
        <iteration index="11" Duration="1.6678526952500761" />
        <iteration index="12" Duration="1.6344956413449836" />
        <iteration index="13" Duration="1.7647877236918248" />
        <iteration index="14" Duration="1.73171577281164" />
        <iteration index="15" Duration="2.0265123004846828" />
        <iteration index="16" Duration="1.7023501612543441" />
        <iteration index="17" Duration="1.7990000866711853" />
        <iteration index="18" Duration="1.8953649090635736" />
        <iteration index="19" Duration="1.9843170528101837" />
        <iteration index="20" Duration="1.7014948521800761" />
        <iteration index="21" Duration="1.8568760007115088" />
        <iteration index="22" Duration="1.8936542909145828" />
        <iteration index="23" Duration="1.8309316254521946" />
        <iteration index="24" Duration="1.7012097491547138" />
        <iteration index="25" Duration="2.1180303714550064" />
        <iteration index="26" Duration="1.84290595249513" />
        <iteration index="27" Duration="1.9027775877093518" />
        <iteration index="28" Duration="2.02622719746023" />
        <iteration index="29" Duration="1.9769043741644055" />
        <iteration index="30" Duration="1.7901618929013239" />
        <iteration index="31" Duration="1.7539538087480651" />
        <iteration index="32" Duration="1.9971466889278418" />
        <iteration index="33" Duration="1.6054151328125954" />
        <iteration index="34" Duration="1.6362062594939744" />
        <iteration index="35" Duration="1.6812525374170946" />
        <iteration index="36" Duration="1.7796130809829265" />
        <iteration index="37" Duration="1.7556644268975106" />
        <iteration index="38" Duration="1.6644314589520945" />
        <iteration index="39" Duration="1.5555221034669557" />
        <iteration index="40" Duration="1.727154124414028" />
        <iteration index="41" Duration="1.5458286006232811" />
        <iteration index="42" Duration="1.5714878728576878" />
        <iteration index="43" Duration="1.8409102313212315" />
        <iteration index="44" Duration="1.8332124496510005" />
        <iteration index="45" Duration="1.7559495299224182" />
        <iteration index="46" Duration="1.5398414371015861" />
        <iteration index="47" Duration="1.6145384296069096" />
        <iteration index="48" Duration="1.7094777368752148" />
        <iteration index="49" Duration="1.6592996045051223" />
        <iteration index="50" Duration="1.6815376404420022" />
        <iteration index="51" Duration="1.5999981753407155" />
        <iteration index="52" Duration="1.61026188423466" />
        <iteration index="53" Duration="1.586598333173697" />
        <iteration index="54" Duration="1.64875079258627" />
        <iteration index="55" Duration="1.6481805865364549" />
        <iteration index="56" Duration="1.5444030854987432" />
        <iteration index="57" Duration="1.538986128027318" />
        <iteration index="58" Duration="2.1610809282042283" />
        <iteration index="59" Duration="1.7188861366944366" />
        <iteration index="60" Duration="1.8232338437819635" />
        <iteration index="61" Duration="1.653312440983882" />
        <iteration index="62" Duration="1.5592284427898448" />
        <iteration index="63" Duration="1.5529561762441517" />
        <iteration index="64" Duration="1.562934782112734" />
        <iteration index="65" Duration="1.5903046724965861" />
        <iteration index="66" Duration="1.5358499947542441" />
        <iteration index="67" Duration="1.6937970705093903" />
        <iteration index="68" Duration="1.6476103804870945" />
        <iteration index="69" Duration="1.6396274957919559" />
        <iteration index="70" Duration="1.5024929408491516" />
        <iteration index="71" Duration="1.6744100648211315" />
        <iteration index="72" Duration="1.5449732915485583" />
        <iteration index="73" Duration="1.6980736158816399" />
        <iteration index="74" Duration="1.6219511082526878" />
        <iteration index="75" Duration="1.5452583945734659" />
        <iteration index="76" Duration="1.5272969040088356" />
        <iteration index="77" Duration="1.6076959570109466" />
        <iteration index="78" Duration="1.6915162463110391" />
        <iteration index="79" Duration="1.6359211564690668" />
        <iteration index="80" Duration="1.5270118009843827" />
        <iteration index="81" Duration="2.0008530282502761" />
        <iteration index="82" Duration="1.7191712397193442" />
        <iteration index="83" Duration="1.5498200429710778" />
        <iteration index="84" Duration="1.7984298806218249" />
        <iteration index="85" Duration="1.6516018228344365" />
        <iteration index="86" Duration="1.5506753520453458" />
        <iteration index="87" Duration="1.8771183154744904" />
        <iteration index="88" Duration="1.8685652247295366" />
        <iteration index="89" Duration="1.5948663208937433" />
        <iteration index="90" Duration="1.7077671187257693" />
        <iteration index="91" Duration="1.5093354134451147" />
        <iteration index="92" Duration="1.5307181403068171" />
        <iteration index="93" Duration="1.5002121166508005" />
        <iteration index="94" Duration="1.5441179824742903" />
        <iteration index="95" Duration="1.627368065724113" />
        <iteration index="96" Duration="1.6644314589520945" />
        <iteration index="97" Duration="2.0273676095594055" />
        <iteration index="98" Duration="1.7014948521796214" />
        <iteration index="99" Duration="1.8865267152937122" />
        <iteration index="100" Duration="1.5284373161084659" />
      </iterations>
    </performance>
  </test>
  <test name="System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: &quot;&quot;)" type="System.Globalization.Tests.Perf_NumberCultureInfo" method="ToString" time="1.0029813" result="Pass">
    <traits>
      <trait name="Benchmark" value="true" />
    </traits>
    <performance runid="System.Globalization.Tests.dll-WindowsCore" etl="D:\tools\testing\relresults\11-16-2015\System.Globalization.Tests.dll-WindowsCore.etl">
      <metrics>
        <Duration displayName="Duration" unit="msec" />
      </metrics>
      <iterations>
        <iteration index="0" Duration="1.6849588767395289" />
        <iteration index="1" Duration="1.4982163954769021" />
        <iteration index="2" Duration="1.6393423927670483" />
        <iteration index="3" Duration="1.6955076886583811" />
        <iteration index="4" Duration="1.4999270136258929" />
        <iteration index="5" Duration="1.542122261300392" />
        <iteration index="6" Duration="1.4982163954764474" />
        <iteration index="7" Duration="1.4982163954764474" />
        <iteration index="8" Duration="1.9603683987247678" />
        <iteration index="9" Duration="2.1274387712742282" />
        <iteration index="10" Duration="1.821523225632518" />
        <iteration index="11" Duration="1.5908748785459466" />
        <iteration index="12" Duration="1.5706325637834198" />
        <iteration index="13" Duration="1.5683517395846138" />
        <iteration index="14" Duration="1.7334263909606307" />
        <iteration index="15" Duration="1.5478243217967247" />
        <iteration index="16" Duration="1.7482517482517324" />
        <iteration index="17" Duration="1.5338542735803458" />
        <iteration index="18" Duration="1.5107609285691979" />
        <iteration index="19" Duration="1.5635049881625491" />
        <iteration index="20" Duration="1.5894493634218634" />
        <iteration index="21" Duration="1.5270118009843827" />
        <iteration index="22" Duration="1.5250160798109391" />
        <iteration index="23" Duration="1.5076247952956692" />
        <iteration index="24" Duration="2.0099763250450451" />
        <iteration index="25" Duration="1.6276531687490206" />
        <iteration index="26" Duration="1.5808962726773643" />
        <iteration index="27" Duration="1.6424785260401222" />
        <iteration index="28" Duration="1.5287224191333735" />
        <iteration index="29" Duration="1.6963629977326491" />
        <iteration index="30" Duration="1.5284373161084659" />
        <iteration index="31" Duration="1.5968620420671868" />
        <iteration index="32" Duration="1.5903046724961314" />
        <iteration index="33" Duration="1.5666411214356231" />
        <iteration index="34" Duration="1.7379880393577878" />
        <iteration index="35" Duration="1.5190289162892441" />
        <iteration index="36" Duration="1.5492498369208079" />
        <iteration index="37" Duration="1.5572327216159465" />
        <iteration index="38" Duration="1.5954365269431037" />
        <iteration index="39" Duration="1.6792568162431962" />
        <iteration index="40" Duration="1.6245170354759466" />
        <iteration index="41" Duration="1.6937970705093903" />
        <iteration index="42" Duration="1.8363485829236197" />
        <iteration index="43" Duration="1.9831766407110081" />
        <iteration index="44" Duration="1.7331412879361778" />
        <iteration index="45" Duration="1.7493921603509079" />
        <iteration index="46" Duration="2.2497479689259308" />
        <iteration index="47" Duration="1.801566013894444" />
        <iteration index="48" Duration="1.7827492142564552" />
        <iteration index="49" Duration="1.7029203673041593" />
        <iteration index="50" Duration="1.6216660052277803" />
        <iteration index="51" Duration="1.6296488899229189" />
        <iteration index="52" Duration="1.7000693370555382" />
        <iteration index="53" Duration="1.522165049562318" />
        <iteration index="54" Duration="1.5712027698327802" />
        <iteration index="55" Duration="2.0142528704172946" />
        <iteration index="56" Duration="1.5680666365597062" />
        <iteration index="57" Duration="1.5572327216164012" />
        <iteration index="58" Duration="1.5954365269435584" />
        <iteration index="59" Duration="1.5446881885241055" />
        <iteration index="60" Duration="1.5806111696524567" />
        <iteration index="61" Duration="1.6772610950692979" />
        <iteration index="62" Duration="1.523020358636586" />
        <iteration index="63" Duration="2.0242314762858769" />
        <iteration index="64" Duration="1.6584442954303995" />
        <iteration index="65" Duration="1.515322576966355" />
        <iteration index="66" Duration="1.5176034011651609" />
        <iteration index="67" Duration="1.5307181403072718" />
        <iteration index="68" Duration="1.6276531687490206" />
        <iteration index="69" Duration="1.6228064173269559" />
        <iteration index="70" Duration="1.6829631555660853" />
        <iteration index="71" Duration="1.6079810600358542" />
        <iteration index="72" Duration="1.7351370091096214" />
        <iteration index="73" Duration="1.6775461980942055" />
        <iteration index="74" Duration="1.650176307710808" />
        <iteration index="75" Duration="1.7619366934436584" />
        <iteration index="76" Duration="1.6878099069876953" />
        <iteration index="77" Duration="1.7867406566037971" />
        <iteration index="78" Duration="1.8086935895157694" />
        <iteration index="79" Duration="1.7000693370555382" />
        <iteration index="80" Duration="1.5828919938512627" />
        <iteration index="81" Duration="1.8468973948424718" />
        <iteration index="82" Duration="1.6228064173269559" />
        <iteration index="83" Duration="1.6467550714123718" />
        <iteration index="84" Duration="1.7733408144367786" />
        <iteration index="85" Duration="1.6547379561079651" />
        <iteration index="86" Duration="1.6855290827893441" />
        <iteration index="87" Duration="1.6433338351143902" />
        <iteration index="88" Duration="1.5828919938508079" />
        <iteration index="89" Duration="1.5700623577336046" />
        <iteration index="90" Duration="1.9985722040514702" />
        <iteration index="91" Duration="1.7610813843693904" />
        <iteration index="92" Duration="1.9201688722237122" />
        <iteration index="93" Duration="1.9116157814787584" />
        <iteration index="94" Duration="1.5891642603969558" />
        <iteration index="95" Duration="1.8862416122688046" />
        <iteration index="96" Duration="1.7191712397193442" />
        <iteration index="97" Duration="1.9652151501463777" />
        <iteration index="98" Duration="1.6541677500586047" />
        <iteration index="99" Duration="1.5985726602166324" />
        <iteration index="100" Duration="1.8788289336234811" />
      </iterations>
    </performance>
  </test>
</collection>

@steveharter commented on Mon Nov 16 2015

Please assign to owner of PAL library, as this is not a globalization issue but instead the implementation of the PAL layer. The method _ecvt (\src\pal\src\cruntime\misctls.cpp) is taking up 95%+ of the time.

Note: the benchmarck test should probably "warm up" the current culture by performing a number.ToString(cultureInfo) before the start of the timing. This will force a one-time load of all of the localization information into the cultureinfo (like the character for the thousands separator, etc).

There are a few inefficiencies in _ecvt:

Changing the implementation to not use the large buffer makes the code run 4x+ faster, such as inserting this code at the beginning (sample - won't work in all cases): pThread = InternalGetCurrentThread(); lpStartOfReturnBuffer = lpReturnBuffer = pThread->crtInfo.ECVTBuffer; sprintf_s(lpStartOfReturnBuffer, sizeof(TempBuffer), "%lf", value); dec = 1; sign = 0; goto done;


@ianhays commented on Mon Nov 16 2015

Please assign to owner of PAL library, as this is not a globalization issue but instead the implementation of the PAL layer. The method _ecvt (\src\pal\src\cruntime\misctls.cpp) is taking up 95%+ of the time.

Thanks for looking into this Steve, will do. Know who that is by any chance?

Note: the benchmarck test should probably "warm up" the current culture by performing a number.ToString(cultureInfo) before the start of the timing. This will force a one-time load of all of the localization information into the cultureinfo (like the character for the thousands separator, etc).

This is taken into account in the analysis. The first few iterations are ignored.


@stephentoub commented on Tue Nov 17 2015

cc: @sergiy-k


@steveharter commented on Tue Nov 17 2015

Not sure who the owner(s) are or the original author. I am willing to take it, but it would have to come after rc2.

Another option is to measure the performance of using ICU to do that parsing.


@gkhurana commented on Wed Dec 09 2015

A few alternatives to sprintf_s, ecvt are snprintf and ecvt_r on unix. ecvt* family of functions are now obsolete and not recommended.


@krwq commented on Mon Nov 28 2016

@ianhays Could you please share the units of the measurements and how does that impact the whole framework? I think we should not be optimizing 1ns->40ns regression or really rarely used calls as it is simply not worth the effort. I will leave this open as up-for-grabs. Please change if you believe this should be fixed sooner.


@ianhays commented on Mon Dec 05 2016

The measurements were made using msbuild /t:buildandtest /p:Performance=true which calls the xunit.performance runner. It looks like this test has actually been removed since the measurement was made, so there isn't any way to repro it in the same way.

really rarely used calls as it is simply not worth the effort.

I wrote a perf test for this method because usage data showed it high a relatively high utilization rate

I think we should not be optimizing 1ns->40ns regression

If you don't feel it is a necessary optimization please feel free to close out the issue. These micro-measurements were done on thousands of iterations which is dissimilar to the actual usage, so whether it is worth improving is up to you.


@AndyAyersMS commented on Mon Dec 05 2016

There are quite a few low-level native helper routines that appear to be slower on linux, for instance much of the math library. See for instance some of the discussion and data in dotnet/coreclr#4847.


@karelz commented on Wed Mar 22 2017

The issue seems to be blocking https://github.com/dotnet/corefx/issues/16636 (which is blocking .NET Core adoption for a customer who reached to us via CSS). @mellinoe can you please help push it forward as part of your perf work in 2.0? (feel free to work with @krwq and @tarekgh, the area owners) See https://github.com/dotnet/corefx/issues/16636 for additional link to CoreCLR (maybe there are some dupes here).


@tannergooding commented on Wed Mar 22 2017

I have https://github.com/dotnet/coreclr/issues/9373, which is tracking the general perf issues with the math library on Linux.


@tarekgh commented on Wed Mar 22 2017

Looks to me there is no action from the library side on this one. all changes would be from the runtime side.

@karelz looks there are some tracking issues targeting fixing this issue. so I think we should close this one. I already ensured dotnet/coreclr#4847 and dotnet/coreclr#9373 are marked to v2.0.0

tarekgh commented 7 years ago

@tannergooding I assigned it to you as you already looking at similar issue.

karelz commented 7 years ago

Just to reiterate importance: The issue seems to be blocking dotnet/corefx#16636 (which is blocking .NET Core adoption for a customer who reached to us via CSS). See benchmatk https://github.com/dotnet/coreclr/issues/5558#issuecomment-288569461 for details

karelz commented 7 years ago

cc @mellinoe to help push it forward

tannergooding commented 7 years ago

We should likely be special casing a few numbers (0, NaN, -Inf, +Inf) and then following the algorithm described here for everything else Printing Floating-Point Numbers Quickly and Accurately. NOTE: Windows follows this algorithm, but additionally continues generating insignificant digits so that powers of two (which are exactly representable) can be printed.

cc @dcwuser, as he generally has helpful feedback in this area as well.

dcwuser commented 7 years ago

Looking at the _ecvt code (in src/pal/src/cruntime/misctls.cpp), it looks to me like this implementation is fundamentally taking the wrong approach. It's essentially calling into sprintf to do the core of its job, then using string munging to fix up the output. That basically means that we do the required work twice: inside sprintf we parse a format string and do the actual digit extraction (for all digits, even if we ultimately only want 2); but inside ToString we have already parsed one format string, and inside _ecvt we are re-doing the extraction of the wanted digits via string manipulation.

The right approach is to go directly from the double to a minimal array of digits inside _ecvt. There should never be a call into sprintf. There should be almost no string munging. (We might be returning a string, but it's best to think of returning an array of digits; if I were writing this for maximum clarity in managed code I would return a List of int.) The approach in the paper that @tannergooding cites looks fantastic and optimal, but even the more naive, plodding approach (multiply by a power to ten to put one digit to the left of the decimal point; pick it off by using floor and casting the result to an int; subtract off the extracted value and multiply by 10 to bring the next digit to the left of the decimal; repeat until the desired number of digits are extracted) is likely to be better than this approach.

It may be possible to get some perf improvement by making small changes to the current approach, and that might even be the right approach to rapidly un-blocking the customer cited by @karelz . But a fundamentally different approach is the correct fix, particularly if that's what's being done on the Windows side.

It might also be worthwhile to look at what's being done inside sprintf when it's handed a double. Presumably it's calling something analogous to _ecvt, and we might be able to call that function directly in our _ecvt implementation.

jkotas commented 7 years ago

_ecvt in the PAL is trying to emulate behavior of Windows _ecvt - in not a very efficient way . The right fix should be to delete _ecvt in the PAL and just implement the DoubleToNumber method directly. Note that NumberToDouble is on that plan already.

dcwuser commented 7 years ago

I found a few ecvt implementations via search engine, and they all do what I describe as the "naive, plodding" approach. No calls to sprintf, no string munging. Since this is part of libc, we should able to call directly into the Linux libc implementation.

https://opensource.apple.com/source/Libc/Libc-167/gen.subproj/ppc.subproj/ecvt.c https://sourcecodebrowser.com/linux86/0.16.17/ecvt_8c.html#aa9d3fe3843186a88d67641f87f62f712

tarekgh commented 7 years ago

I recall ecvt was not available on MAC but I am not sure. we need to validate it exists

tarekgh commented 7 years ago

now I recall, we cannot use ecvt/fcvt because it is not thread safe as it uses a fixed buffer to return the results for all calls. the safe one ecvt_r is not available on MAC.

so I think what @jkotas suggested is the way to go.

dcwuser commented 7 years ago

@tarekgh, you appear to be correct. Who writes a method like that in a general-use base library?

Forgive me for being dense, but it's not entirely clear to me what you mean, @jkotas. I looked at the code path underneath Double.Parse, and it looks like it feeds into a series of methods that are fully implemented in managed code, without any extern calls. Do you mean, we should do the same for Double.ToString: write a digit extractor fully in managed code -- essentially a managed, well-implemented ecvt function?

tannergooding commented 7 years ago

@dcwuser, I believe he is referring to this line here: https://github.com/dotnet/coreclr/blob/52a816d3011f4d03b80b5940dc036b40d701f52d/src/classlibnative/bcltype/number.cpp#L152 (Note: the function is implemented in raw asm for x86: https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/i386/fptext.asm#L78).

I think the quick-fix is to just modify the PAL implementation of _ecvt (this is mostly because of the x86 changes that would be required as well) to be more efficient and the longer term fix is to get rid of _ecvt altogether (and optionally move the implementation entirely to managed code).

tarekgh commented 7 years ago

Do you mean, we should do the same for Double.ToString: write a digit extractor fully in managed code -- essentially a managed, well-implemented ecvt function?

what I meant is we implement DoubleToNumber fully independent of the OS. either in managed if there will not be any concern with the perf or in native side if the managed side perf would not be acceptable. We can use whatever reasonable algorithm suggested in this thread in the implementation.

tmds commented 7 years ago

Linux (glibc) has ecvt_r, so calling ecvt_r on platforms that have it, should yield a performance improvement.

tannergooding commented 7 years ago

@tarekgh, is the performance on Mac currently a problem/concern as well (I don't see any numbers)?

If not, adding the appropriate conditional so that ecvt_r is called on Linux seems sufficient for the time being.

After going over it a bit, the issue with all the 'sufficiently fast' implementations is that they rely on high-precision integer arithmetic (that is BigInteger math) to make things fast and accurate. Given that there is no BigInteger type built-in, one has to implement a BigInteger implementation that contains, at a minimum, just the required behavior (essentially limited to: left shift, 2^x, 10^x, multiply, and divide).

As such, going the route for the fast implementation probably requires a bit more bake time (to make sure it is good and working correctly) then we likely want to spend for a quick-turnaround.

Now, if we want to look at this a different way, the biggest issue with the current implementation isn't that it is calling sprintf_s. The issue is that it is calling sprintf_s with a format string of %.348e (which prints 348 digits + exponent). This requires us to do additional parsing, rounding, etc... If we instead did %.##f (where ## was the precision requested). We could then do one loop over the resulting string to pull out both dec and sign and then a memcpy to copy the appropriate bits (everything except for the sign and decimal point) into the result buffer. This should be significantly faster and simpler than what we have now.

jkotas commented 7 years ago

This is basically what is done in CoreRT implementation:

https://github.com/dotnet/corert/blob/82073fcdf12869d21f36fd44720744e3818d141b/src/System.Private.CoreLib/src/System/Globalization/FormatProvider.FormatAndParse.Unix.cs#L67 https://github.com/dotnet/corert/blob/26413145abcbdd49468db230a6f29784494f6f46/src/Native/System.Private.CoreLib.Native/pal_cruntime.cpp#L10

tarekgh commented 7 years ago

we can look at doing what we have done on coreRT and measure the gain then decide if this will be acceptable.

tmds commented 7 years ago

we can look at doing what we have done on coreRT and measure the gain then decide if this will be acceptable.

Can you also benchmark the CortRT implementation vs calling native ecvt_r?

tannergooding commented 7 years ago

@jkotas, It doesn't look like we can use the CoreRT implementation. It appears to be limited to 40 digits of precision, but there are plenty of numbers which are larger which need support.

As for using ecvt_r, I have the change ready for benching... Do we have an easy way to run the CoreFX perf tests against a CoreCLR PR today?

jkotas commented 7 years ago

there are plenty of numbers which are larger which need support.

They are handled elsewhere. The precision passed to DoubleToNumber is never more than 17.

tannergooding commented 7 years ago

Moving to ecvt_r is showing about a 25% improvement (see below, the txt files are the raw csv outputs). We don't seem to have any CoreFX perf jobs actually hooked up for Linux right now, so it is hard to get any actual numbers (especially numbers that can be used in comparison to those in the original post).

I am still working on checking out the CoreRT implementation and hope to have some numbers tomorrow.

Before: Before.txt

Running 14 Benchmarks out of 14 Xunit Facts...
  System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: "fr")
  System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "da")
  System.Globalization.Tests.Perf_CultureInfo.GetCurrentCulture
  System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: "")
  System.Globalization.Tests.Perf_CultureInfo.GetInvariantCulture
  System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "ja")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: "da")
  System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: "ja")
  System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "fr")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: "da")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: "")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: "ja")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: "fr")
Finished 14 tests in 85.399s (0 failed, 0 skipped)

After: After.txt

Running 14 Benchmarks out of 14 Xunit Facts...
  System.Globalization.Tests.Perf_CultureInfo.GetCurrentCulture
  System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "da")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: "fr")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: "")
  System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "ja")
  System.Globalization.Tests.Perf_CultureInfo.GetInvariantCulture
  System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: "da")
  System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: "ja")
  System.Globalization.Tests.Perf_NumberCultureInfo.ToString(culturestring: "fr")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: "da")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: "")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: "ja")
  System.Globalization.Tests.Perf_DateTimeCultureInfo.ToString(culturestring: "fr")
Finished 14 tests in 63.678s (0 failed, 0 skipped)
karelz commented 7 years ago

BTW: We are just standing up CoreFX perf lab (internal only at this moment). @mellinoe knows details where we are.

tarekgh commented 7 years ago

@tannergooding @jkotas

if I am not mistaken, it looks the corert implementation is similar to the current Linux implementation in coreclr

in coreclr: https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/pal/src/cruntime/misctls.cpp#L151

in coreRT: https://github.com/dotnet/corert/blob/82073fcdf12869d21f36fd44720744e3818d141b/src/System.Private.CoreLib/src/System/Globalization/FormatProvider.FormatAndParse.Unix.cs#L21

looks to me the differences is small so I expect the perf will be close.

Also when looking at the ecvt_r implementation, I am seeing it is using __snprintf which means it is similar approach as coreclr and corert

https://code.woboq.org/userspace/glibc/misc/efgcvt_r.c.html#117

jkotas commented 7 years ago

corert implementation is similar to the current Linux implementation

@tarekgh It is similar but there is significant difference in amount of extra parsing, transforming and copying. There is a lot less of it the CoreRT implementation (still can be improved). I have done a quick test on Mac: call ToString on 1.2345 in a loop. CoreRT is ~1.7x faster.

fanoI commented 7 years ago

@jkotas but the objective in the future is to unite these CoreRT and Coreclr implementations that is using always a managed version?

jkotas commented 7 years ago

@fanoI Yes, it is eventual goal. However, this issue is about improving the performance for CoreCLR, for .NET Core 2.0 release. The implementation does not have to be unified as part of this fix.

fanoI commented 7 years ago

OK the objective of Cosmos and CoreRT to have more C# possible written in C# coincide it is for this that I have a particular interest in this issue.

tarekgh commented 7 years ago

Some update:

I have tried changing the implementation of _ecvrt to just call ecvt_r and I am seeing this is getting us back to Windows perf numbers (and even a little bit better).

I used the loop:

            double number = 104234.343;

            for (int i = 0; i < 1000000; i++)
            {
                number.ToString(); // use current culture
            }

on Windows 10 VM I got this executed in average of 650 ms on Ubuntu VM (same Windows 10 VM configuration and running on the same host machine) I got 3600 ms On same Ubuntu VM with using ecvt_r I got 500 ms

I used Stopwatch but the results is consistent and the number mentioned here is the average.

I am going to try to use CoreRT implementation (I'll convert it to native implementation for the sake of the experiment) and will get the perf numbers for it.

tarekgh commented 7 years ago

I have tried the CoreRT implementation and I got the number 2200 ms which is faster than the current implementation by 1.6 times (this almost the same result that @jkotas got).

considering the results, it is clear now that the way we go for Linux is to use ecvt_r. we can consider the corert implementation for OSX. still not perfect but would be acceptable for v2.0.

what people think about this plan?

tannergooding commented 7 years ago

I'm in favor of this. That will also give more time to ensure something like Printing Floating-Point Numbers Quickly and Accurately is implemented properly.

jkotas commented 7 years ago

You should verify the performance for variety of number (e.g. I would be curious about Double.MaxValue/2). If it all looks good, the plan sounds fine to me.

tarekgh commented 7 years ago

I have tested with Double.MaxValue/2 and got interesting results which confirm the plan. I run the test with same loop mentioned above.

Linux without any changes it took 13,200 ms Windows it took 6200 ms Linux with the corert implementation it took 3900 ms Linux with using ecvt_r 1100 ms

It is clear using ecvt_r is the best and corert implementation was much better than Windows in such case

tarekgh commented 7 years ago

It looks using ecvt_r produce slight different behavior. I am seeing some tests (mostly serialization tests) failed because of the behavior difference. here is some example:

     System.Json.Tests.JsonValueTests.Parse_IntegralBoundaries_LessThanMaxDouble_Works(jsonString: \"79228162514264337593543950336\", expectedToString: \"7.9228162514264338E+28\") [FAIL]
18:47:52         Assert.Equal() Failure
18:47:52                                   ↓ (pos 16)
18:47:52         Expected: 7.9228162514264338E+28
18:47:52         Actual:   7.9228162514264344E+28
18:47:52                                    ↑ (pos 16)
18:47:52         Stack Trace:
18:47:52            /mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/src/System.Json/tests/JsonValueTests.cs(723,0): at System.Json.Tests.JsonValueTests.Parse(String jsonString, Action`1 action)
18:47:52      System.Json.Tests.JsonValueTests.JsonValue_Parse_Double_ViaJsonPrimitive(number: 1.8014398509482E+16) [FAIL]

@jkotas @dcwuser I think this is acceptable difference as the results still look right to me. now we need to decide is we go ahead with such difference for the sake of the performance? or should we use coreRT implementation all the way and scarify some of the performance for the sake of consistency? I am inclining to keep using ecvt_r as the performance is big win and the results still correct. what you think?

tarekgh commented 7 years ago

I forgot to mention I am seeing other couple of failures which I'll try to investigate

here is the list of the failures:

https://ci.dot.net/job/dotnet_coreclr/job/master/job/jitstress/job/x64_checked_ubuntu_corefx_baseline_prtest/38/

tannergooding commented 7 years ago

@tarekgh, this should be fine, provided we are still IEEE compliant (it looks like we are).

The string conversion rules are essentially (these are defined in 5.12 Details of conversion between floating-point data and external character sequences):

tarekgh commented 7 years ago

thanks @tannergooding for the feedback.

ANahr commented 7 years ago

If there is a chance to get identical output across all platforms I'd vote to have identical output even if it comes at a modest price in performance.

tannergooding commented 7 years ago

I think getting identical output could be part of future work here. Provided we are at least IEEE compliant, we are 'good enough' to resolve the bug and unblock the customer.

There are plenty of places in the floating-point APIs where we are not consistent across various platforms right now, and there are several bugs tracking us investigating whether or not making them use the same underlying implementation is worthwhile (both for perf and reliability).

dcwuser commented 7 years ago

I agree that for default formatting it's not too bad that we don't get identical output, but the way I would expect serialization/deserialization to work is to use the "R" (for roundtrip) format string, which should guarantee that absolutely identical FP values are produced after a serialization/deserialization cycle, which can be tested for with == without any allowance for epsilons. If we are not using "R" is our serialization code, I would consider that a bug in the serialization library. If we are using "R" but we don't roundtrip FP values, I would consider that a bug in our FP printing code, regardless of what the IEEE spec says, because that's what R is supposed to guarantee.

tarekgh commented 7 years ago

I'll look more on the failed tests. I am also noticing that our parse code not able to parse Double.MinValue string produced with cvrt_r. it produce -1.797693134862316E+308 while it can parse the value produced from other implementation which is -1.7976931348623157E+308

tarekgh commented 7 years ago

it looks we cannot use evct_r because this will break the round tripping. Double.MinValue is one example of that. evct_r produce string which we cannot parse it back. I doubt we need to touch the parsing code now.

considering that I suggest to use the corert implementation across all Linux distros. this will give us better performance but will not get us to Windows performance. in other word, it almost double the speed of the current behavior when used in simple cases (like formatting 123.45) but it will still be 3x slower than Windows. in more complex numbers like Double.MaxValue it will be faster even than Windows.

I'll look in the near future in having our own implementation of DoubleToNumber so we can get more better performance and cut the dependency on the any of the OS call so we can be portable.

does this sound reasonable for you? if so I can update my PR according to that. please let me know what you think?

tarekgh commented 7 years ago

I have merged the changes of using the corert similar implementation and I'll keep this issue open to look at more optimization in the near future. thanks all for your feedback.

karelz commented 7 years ago

@tarekgh given that this is blocking one of the customer end-to-end problems. I wonder if we should just close this one as 'fixed' and file a new issue tracking further improvements, linking some top ideas from here ... the discussion here also quite long anyway ... thoughts?

karelz commented 7 years ago

Also the title is now probably wrong ... 7x is likely misleading at this moment ...

tarekgh commented 7 years ago

ok, I'll log a new issue later today

tmds commented 7 years ago

@tarekgh is it a bug in the parsing code some of the strings produced by evct_r can't be parsed? Or is this expected?

tarekgh commented 7 years ago

is it a bug in the parsing code some of the strings produced by evct_r can't be parsed? Or is this expected?

It is not a parsing bug. what is happening is evct_r is rounding some numbers which cause the problem in the boundary cases. for example when using Double.MinValue, evct_r will produce "-1.797693134862316E+308" while without evct_r we produce "-1.7976931348623157E+308". -1.797693134862316E+308 is considered less than Double.MinValue and making the parsing throw overflow exception. As I mentioned before we can look at the parsing code to handle such boundary cases but I won't touch the parsing code now before we go and investigate the whole story of the formatting and the parsing together.

tarekgh commented 7 years ago

logged dotnet/coreclr#10651 tracking the remaining work need to be done here and closing this issue per @karelz suggestion.

karelz commented 7 years ago

Thanks!

gafter commented 7 years ago

@tarekgh, this should be fine, provided we are still IEEE compliant (it looks like we are).

The string conversion rules are essentially (these are defined in 5.12 Details of conversion between floating-point data and external character sequences):

  • A string with 15 significant digits converted to the binary64 format (double) and then converted back should exactly match.
  • A double converted to a decimal string with at least 17 digits and then converted back should exactly match

Also, from the IEEE spec 5.12.2:

Conversions to and from supported decimal formats shall be correctly rounded regardless of how many digits are requested or given.

There are many numbers for which double.Parse produces the incorrect result; see https://github.com/dotnet/coreclr/issues/1316