NREL / ssc

SAM Simulation Core (SSC) contains the underlying performance and financial models for SAM
BSD 3-Clause "New" or "Revised" License
79 stars 85 forks source link

arm64 failing ssc tests #1026

Closed sjanzou closed 2 months ago

sjanzou commented 1 year ago

Four failing ssc tests on arm64 architecture (all 725 run fine on amd64 Windows, Linux and Mac)

  1. lib_battery_test.runTestCycleAt3C

  2. csp_common.TesSubcomponentCmod.Default

  3. CMPvsamv1BatteryIntegration_cmod_pvsamv1.ResidentialDCBatteryModelPriceSignalDispatch

  4. CMBattwatts_cmod_battwatts.NoPV

Detailed value difference for each test below

[0;32m[ RUN ] lib_battery_test.runTestCycleAt3C /Users/imacuser/Public/Projects/GitHub/NREL/ssc/test/shared_test/lib_battery_capacity_test.h:45: Failure The difference between tested_state.q0 and expected_state.q0 is 0.025430198576842145, which exceeds tol, where tested_state.q0 evaluates to 47.115430198576846, expected_state.q0 evaluates to 47.090000000000003, and tol evaluates to 0.01. runTest: 3 /Users/imacuser/Public/Projects/GitHub/NREL/ssc/test/shared_test/lib_battery_test.h:142: Failure The difference between tested_state.V and expected_state.batt_voltage is 0.02420895181734295, which exceeds 0.01, where tested_state.V evaluates to 467.11420895181732, expected_state.batt_voltage evaluates to 467.08999999999997, and 0.01 evaluates to 0.01. runTest: 3 [ FAILED ] lib_battery_test.runTestCycleAt3C (373 ms) [----------] 1 test from lib_battery_test (373 ms total)



[0;32m[----------] 1 test from csp_common.TesSubcomponentCmod [ RUN ] csp_common.TesSubcomponentCmod.Default /Users/imacuser/Public/Projects/GitHub/NREL/ssc/test/shared_test/lib_csp_tes_test.cpp:411: Failure The difference between csp_subcomponent.GetOutputVector("T_sink_in")[idx] and T_sink_in_expected[idx] is 0.11852206011218414, which exceeds 0.1, where csp_subcomponent.GetOutputVector("T_sink_in")[idx] evaluates to 383.69852206011217, T_sink_in_expected[idx] evaluates to 383.57999999999998, and 0.1 evaluates to 0.10000000000000001. at index: 18 /Users/imacuser/Public/Projects/GitHub/NREL/ssc/test/shared_test/lib_csp_tes_test.cpp:411: Failure The difference between csp_subcomponent.GetOutputVector("T_sink_in")[idx] and T_sink_in_expected[idx] is 0.12582487030903167, which exceeds 0.1, where csp_subcomponent.GetOutputVector("T_sink_in")[idx] evaluates to 383.66582487030905, T_sink_in_expected[idx] evaluates to 383.54000000000002, and 0.1 evaluates to 0.10000000000000001. at index: 19 /Users/imacuser/Public/Projects/GitHub/NREL/ssc/test/shared_test/lib_csp_tes_test.cpp:411: Failure The difference between csp_subcomponent.GetOutputVector("T_sink_in")[idx] and T_sink_in_expected[idx] is 0.1270992339990471, which exceeds 0.1, where csp_subcomponent.GetOutputVector("T_sink_in")[idx] evaluates to 383.62709923399905, T_sink_in_expected[idx] evaluates to 383.5, and 0.1 evaluates to 0.10000000000000001. at index: 20 /Users/imacuser/Public/Projects/GitHub/NREL/ssc/test/shared_test/lib_csp_tes_test.cpp:411: Failure The difference between csp_subcomponent.GetOutputVector("T_sink_in")[idx] and T_sink_in_expected[idx] is 0.11959526859203606, which exceeds 0.1, where csp_subcomponent.GetOutputVector("T_sink_in")[idx] evaluates to 383.57959526859202, T_sink_in_expected[idx] evaluates to 383.45999999999998, and 0.1 evaluates to 0.10000000000000001. at index: 21 /Users/imacuser/Public/Projects/GitHub/NREL/ssc/test/shared_test/lib_csp_tes_test.cpp:411: Failure The difference between csp_subcomponent.GetOutputVector("T_sink_in")[idx] and T_sink_in_expected[idx] is 0.11808927687548021, which exceeds 0.1, where csp_subcomponent.GetOutputVector("T_sink_in")[idx] evaluates to 383.51808927687546, T_sink_in_expected[idx] evaluates to 383.39999999999998, and 0.1 evaluates to 0.10000000000000001. at index: 22 /Users/imacuser/Public/Projects/GitHub/NREL/ssc/test/shared_test/lib_csp_tes_test.cpp:411: Failure The difference between csp_subcomponent.GetOutputVector("T_sink_in")[idx] and T_sink_in_expected[idx] is 0.1205129327666441, which exceeds 0.1, where csp_subcomponent.GetOutputVector("T_sink_in")[idx] evaluates to 383.43051293276665, T_sink_in_expected[idx] evaluates to 383.31, and 0.1 evaluates to 0.10000000000000001. at index: 23 [ FAILED ] csp_common.TesSubcomponentCmod.Default (1 ms) [----------] 1 test from csp_common.TesSubcomponentCmod (1 ms total)

[----------] 1 test from CMPvsamv1BatteryIntegration_cmod_pvsamv1 [ RUN ] CMPvsamv1BatteryIntegration_cmod_pvsamv1.ResidentialDCBatteryModelPriceSignalDispatch /Users/imacuser/Public/Projects/GitHub/NREL/ssc/test/ssc_test/cmod_battery_pvsamv1_test.cpp:951: Failure The difference between batt_q_rel.back() and 98.029 is 0.048913833456722955, which exceeds 2e-2, where batt_q_rel.back() evaluates to 97.980086166543273, 98.029 evaluates to 98.028999999999996, and 2e-2 evaluates to 0.02. [ FAILED ] CMPvsamv1BatteryIntegration_cmod_pvsamv1.ResidentialDCBatteryModelPriceSignalDispatch (33889 ms) [----------] 1 test from CMPvsamv1BatteryIntegration_cmod_pvsamv1 (33889 ms total)

[----------] 1 test from CMBattwatts_cmod_battwatts [ RUN ] CMBattwatts_cmod_battwatts.NoPV /Users/imacuser/Public/Projects/GitHub/NREL/ssc/test/ssc_test/cmod_battwatts_test.cpp:183: Failure The difference between maxCycles and 520 is 1, which exceeds 0.1, where maxCycles evaluates to 519, 520 evaluates to 520, and 0.1 evaluates to 0.10000000000000001. [ FAILED ] CMBattwatts_cmod_battwatts.NoPV (271 ms) [----------] 1 test from CMBattwatts_cmod_battwatts (271 ms total)

sjanzou commented 1 year ago

Note that GitHub currently has only x86_64 hosted runners, so we would need to self-host an arm64 runner for CI support or do manually before releases and patches.

https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners

brtietz commented 1 year ago

Price signals dispatch test is a known issue filed here: https://github.com/NREL/ssc/issues/614, I don't see a reason to delay the patch for this one.

Do the other two tests include https://github.com/NREL/ssc/pull/1025/files?

sjanzou commented 1 year ago

@brtietz , by constraining the capacity_t::update_SOC to the min and max SOC, arm64 and x86_64 agree with 514 cycles for CMBattwatts_cmod_battwatts.NoPV.

image

The "Fix" was found running Windows and Mac side by side and stepping through and Windows was always getting an SOC of 14.9999999996 and the min SOC was 15. The calculated Mac value was 15. Thus, in the lib_battery_dispatch.cpp line 847, the Windows current was 6.29e-16 and was triggering a cycle and the Mac current was 0 and no cycle was triggered.

However, this does not fix the q0 difference in lib_battery_test.runTestCycleAt3C. The issue appears to be in the calculation of q0 which is dependent on qmax_thermal and qmax_lifetime.

brtietz commented 1 year ago

Great job finding the issue! I worry this solution could cause inconsistencies withing other variables. For example, if the battery is below the minimum SOC, that means the current was too high now or in a previous step.

Is it possible that some of the tolerances in https://github.com/NREL/ssc/blob/3d900eaa0b2c2addce6433fa01e0fb5d5a901005/shared/lib_battery_capacity.cpp#L142 are too large? That function already has the infrastructure to adjust the current for this type of issue.

sjanzou commented 1 year ago

Great job finding the issue! I worry this solution could cause inconsistencies withing other variables. For example, if the battery is below the minimum SOC, that means the current was too high now or in a previous step.

Is it possible that some of the tolerances in

https://github.com/NREL/ssc/blob/3d900eaa0b2c2addce6433fa01e0fb5d5a901005/shared/lib_battery_capacity.cpp#L142

are too large? That function already has the infrastructure to adjust the current for this type of issue.

Agreed! I want to fix the q0 calculation upstream using qmax_thermal and qmax_lifetime and remove the SOC bound checking in capacity_t::update_SOC()

dguittet commented 5 months ago

Didn't see this issue before I pushed #1160

The latest Mac CI runner is now arm64, so we'd have to use macos-12 in order to test Intel Mac. The two tests that were still failing were

  1. CMBattwatts_cmod_battwatts.NoPV: I didn't get down to the reason for the changes in number of cycles. I could potentially reopen the issue and try to track down the reason.

  2. lib_battery_test.runTestCycleAt3C: was due to minor changes in the values so I just updated the values and/or tolerance

sjanzou commented 5 months ago

Didn't see this issue before I pushed #1160

The latest Mac CI runner is now arm64, so we'd have to use macos-12 in order to test Intel Mac. The two tests that were still failing were

  1. CMBattwatts_cmod_battwatts.NoPV: I didn't get down to the reason for the changes in number of cycles. I could potentially reopen the issue and try to track down the reason.
  2. lib_battery_test.runTestCycleAt3C: was due to minor changes in the values so I just updated the values and/or tolerance

@brtietz, @dguittet - I would suggest that we get both Intel and Apple Silicon mac runners going for the CI testing so that we can make sure there are no architecture issues. (Along with everyone running the ssc tests on Windows occasionally).

What do you think?

brtietz commented 5 months ago

It looks like macos-12 will be deprecated in just over a month: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners

I'm having to read between the lines of GitHub's documentation for this a little. https://github.com/actions/runner-images/blob/main/images/macos/macos-14-arm64-Readme.md is the best resource. It looks like if we run with macos-14-arm64 and macos-14 we'll be able to do it. Do you both agree with that interpretation?

dguittet commented 5 months ago

I think this table is probably the best reference:

https://github.com/actions/runner-images?tab=readme-ov-file#available-images

I’ll try out adding the intel based mac runner to the CI workflow.

dguittet commented 5 months ago

@sjanzou @brtietz And why don't we have a Windows CI again? there are windows runners available

brtietz commented 5 months ago

@dguittet I haven't had time to set one up. If you have time I think it be good to have one.

brtietz commented 5 months ago

I think this table is probably the best reference:

https://github.com/actions/runner-images?tab=readme-ov-file#available-images

I’ll try out adding the intel based mac runner to the CI workflow.

Does NREL have one of the plans that enables the large runners? https://docs.github.com/en/actions/using-github-hosted-runners/about-larger-runners/about-larger-runners

dguittet commented 5 months ago

I was able to use it just now for pysam

sjanzou commented 5 months ago

I was able to use it just now for pysam

Awesome! Can we setup all runners for all repos? Maybe issues for each repo and we can divide the implementation?

dguittet commented 5 months ago

I've finished setting up Macosx intel runners for SSC, working on finishing up the Windows one, then I will create a PR.

Opened an issue: https://github.com/NREL/SAM/issues/1769

sjanzou commented 2 months ago

Based on the arm64 and x86_64 Mac CI runners and the Ubuntu and Windows runners passing and checking out and running all 758 ssc tests on MacOS 14.5 with Apple silicon (arm64 M1 Max chip)

image

Going to close this issue as complete with latest patch branches