dynamics-of-stellar-systems / dynamite

dynamics, age and metallicity indicators tracing evolution
MIT License
14 stars 1 forks source link

Fix crash when using Python nnls and #coefficients in GH file does not match config file's number_GH #325

Closed maindlt closed 11 months ago

maindlt commented 1 year ago

Bugfix for: DYNAMITE crashes when using the Python nnls weight solver and the GH kinematics file had a larger number of coefficients than indicated in the config file (in GaussHermite.get_observed_values_and_uncertainties() allocation was based on the config file's number, two loops were based on the kinematics file column count).

To discuss: In kinematics.py, the attribute GaussHermite.max_gh_order is calculated based on the number of columns in gauss_hermite_kins.ecsv. Wherever the maximum GH order is referenced in the code though, the value number_GH from the config file is used. Shall we perhaps remove the attribute GaussHermite.max_gh_order to reduce confusion?

Tested with:

Please have a look at the code changes (only two lines) and merge if satisfied ;-)

prashjet commented 11 months ago

One thing to check before merging: it should be possible for the number_GH from the config file to be bigger than the max_gh_order as inferred from the kinematics file.

maindlt commented 11 months ago

Hi @prashjet,

to ensure both ways work (number_GH from config <, =, > number of gh components in the data files), DYNAMITE now compares the two after reading both the kinematics files and the config.weight_solver_settings. This happens in Configuration.validate() If the numbers of gh coefficients differ, the kinematics.data astropy table is adapted: either all-zero columns are added or columns are removed (method GaussHermite.adjust_gh_data_to_coefficient_number()). This should have minimal side effects on the code. Does this make sense?

Tested with (the usual) test_nnls.py with number_GH set to 3, 4, and 5.

Cheers, Thomas

prashjet commented 11 months ago

Hey Thomas. Regarding my last comment,

One thing to check before merging: it should be possible for the number_GH from the config file to be bigger than the max_gh_order as inferred from the kinematics file.

the point I wanted to make was that this functionality was already in-place. This was always the desired behaviour. I just wanted to check that this PR wouldn't break this existing behaviour. I think your new method GaussHermite.adjust_gh_data_to_coefficient_number is a correct way to implement the same behaviour when it comes to weight solving, but I'm worried about unexpected consequences elsewhere.

Could you run a few more tests to rule out unexpected consequences? Run tutorial notebook #3 in the scenarios:

maindlt commented 11 months ago

Hi Prash,

hm, I am afraid the (number_GH) > (number in data file) functionality broke already in one of the earlier PRs:

When I run 3_model_iterations_and_plots.ipynb in the present (Oct 24, 2023) master branch, I get an error and a crash when setting number_GH=5 (number in data file is 4). It seems to have to do with implementing chi2_kinmap, see the error message below. Both the legacy and scipy weight solvers behave the same.

Running the same tutorial in this branch with number_GH=4 and number_GH=5 succeeds for both legacy and scipy weight solvers (except for the beta plots which only work with the legacy weight solver). The attached plots_compare.zip has the output plots of the tutorial notebook 3 for number_GH 4 and 5 and both weight solvers. They look the same: plots_compare.zip

For me it is not 100% clear whether it is ok to add hx=dhx=0 columns for x=(number in data file+1)...number_GH, but what else to assume if (number_GH) > (number in data file)?

I'll be happy to discuss :-)

Cheers, Thomas

RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/maindl/.local/lib/python3.9/site-packages/multiprocess-0.70.14-py3.9.egg/multiprocess/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/Users/maindl/.local/lib/python3.9/site-packages/multiprocess-0.70.14-py3.9.egg/multiprocess/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/Users/maindl/.local/lib/python3.9/site-packages/dynamite-4.0.0-py3.9.egg/dynamite/model_iterator.py", line 437, in create_and_run_model
    weight_solver = mod.get_weights(orblib)
  File "/Users/maindl/.local/lib/python3.9/site-packages/dynamite-4.0.0-py3.9.egg/dynamite/model.py", line 1014, in get_weights
    weights, chi2_tot, chi2_kin, chi2_kinmap = weight_solver.solve(orblib)
  File "/Users/maindl/.local/lib/python3.9/site-packages/dynamite-4.0.0-py3.9.egg/dynamite/weight_solvers.py", line 318, in solve
    chi2_kinmap = self.chi2_kinmap(weights)
  File "/Users/maindl/.local/lib/python3.9/site-packages/dynamite-4.0.0-py3.9.egg/dynamite/weight_solvers.py", line 112, in chi2_kinmap
    obs_val = np.array(kinematics_data[coef])
  File "/Users/maindl/.local/lib/python3.9/site-packages/astropy-5.3.1-py3.9-macosx-10.9-x86_64.egg/astropy/table/table.py", line 2055, in __getitem__
    return self.columns[item]
  File "/Users/maindl/.local/lib/python3.9/site-packages/astropy-5.3.1-py3.9-macosx-10.9-x86_64.egg/astropy/table/table.py", line 264, in __getitem__
    return OrderedDict.__getitem__(self, item)
KeyError: 'h5'
"""
prashjet commented 11 months ago

hm, I am afraid the (number_GH) > (number in data file) functionality broke already in one of the earlier PRs:

I see! OK, well, so maybe it's the perfect time to fix it :)

For me it is not 100% clear whether it is ok to add hx=dhx=0 columns for x=(number in data file+1)...number_GH, but what else to assume if (number_GH) > (number in data file)?

For (number_GH) > (number in data file), we do want hx=0, but the most sensible choice for dhx would be to set it to the systematic error specified in the configuration file, i.e. under

weight_solver_settings --> GH_sys_err

which we define in the docs as

a string of length 2 + number_GH floats, the systematic error applied to V, sigma, h3, …, hN

One complication, however, is that these systematic errors are already being used internally e.g. here, with a similar thing happening in all fortran weight solving scripts. If instead, at initialisation we added the systematic errors to the data table, then we would have to adjust elsewhere so that the systematic errors are not added twice.

I do think that having the systematic errors added to the data table at initialisation makes sense. It seems like the most streamlined was to take them into account downstream e.g. for plotting.

maindlt commented 11 months ago

Hi Prash,

great, many thanks for your comments! I'll try to have it done (add systematic errors at initialization and take care of the specialties in calling the different weight solvers) before our Friday meeting, most probably tomorrow.

A nice side effect of this whole thing is that because the kinematics coefficient numbers are checked upon initialization we won't need to check for equal numbers of gh coefs (in case of multiple kinsets) in other parts of DYNAMITE.

I do think that having the systematic errors added to the data table at initialisation makes sense. It seems like the most streamlined was to take them into account downstream e.g. for plotting.

To clarify: right now plotting the kinematic maps uses the dhx columns as they are in the kinematics file (i.e., without adding the systematic error). When we add those errors at initialization they need to be subtracted when plotting (so the kin maps look the same as they do now), right?

Many thanks and cheers, Thomas

maindlt commented 11 months ago

Hi Prash,

looking at the uses of the Kinematics.data object in the LegacyWeightSolver and Plotter, I think it is worthwhile to keep the 'raw' version (without the systematic errors). So I created a second Kinemtics.data_sys_err Astropy table which has the systematic errors applied and also the dv and dsigma from vdMarel + Franx 93, which also goes into the scipy weight solver via the updated GaussHermite.get_observed_values_and_uncertainties(). That table is created by the config reader after reading both the kinematics data and the weight solver settings. Right now it's in config's validate method (because that picks out all the individual kinematics already), but we can of course put it more prominently at the end of config's init method...

Cheers, Thomas

maindlt commented 11 months ago

Hi Prash,

the last commit implements as discussed the following regarding Gauss Hermite kinematics:

Tested with:

Issues:

Cheers, Thomas

FCC047 test runs, LegacyWeightSolver

master branch number_GH=4:

       m-bh         a-bh c-dh   f-dh  q-stars p-stars u-stars  ml        chi2             kinchi2          kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 5.5 424493.15420013765 65666.78023535715 113735.67888008912
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 4.5  429197.1525016983 67539.82992857283  87613.20983329182
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 6.5  429339.9158990041 71949.22739494297 164348.79223217713

this branch number_GH=3

       m-bh         a-bh c-dh   f-dh  q-stars p-stars u-stars  ml        chi2            kinchi2           kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ----------------- ------------------ ------------------
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 5.5 416425.2884538298 60359.941437665395 115153.38302447945
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 4.5 418670.8642388978  59810.58616600247  88574.86547229892
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 6.5 421946.4129168541  66917.66017005831 165940.24584393486

this branch number_GH=4

       m-bh         a-bh c-dh   f-dh  q-stars p-stars u-stars  ml        chi2             kinchi2          kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 5.5 424493.15420013765 65666.78023535715 113735.67888008912
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 4.5  429197.1525016983 67539.82992857283  87613.20983329182
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 6.5  429339.9158990041 71949.22739494297 164348.79223217713

this branch number_GH=5

       m-bh         a-bh c-dh   f-dh  q-stars p-stars u-stars  ml        chi2             kinchi2          kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 5.5  424501.6272540993 65672.14007516083 113730.16386600249
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 4.5 429210.03923869145 67547.53606716642  87602.64374101332
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 6.5 429346.53733657196 71953.90520141821 164345.39048523517

FCC047 test runs, Python NNLS

master branch number_GH=4:

       m-bh         a-bh c-dh   f-dh  q-stars p-stars u-stars  ml        chi2             kinchi2          kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 5.5 424493.15420013946 65666.78023530384 113735.67911982398
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 4.5  429197.1525017018 67539.82992860806  87613.20968621966
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 6.5 429339.91589900514 71949.22739492588 164348.79271744133

this branch number_GH=3

       m-bh         a-bh c-dh   f-dh  q-stars p-stars u-stars  ml        chi2             kinchi2          kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 5.5  416425.2884538317 60359.94143765696 2091496226.6670666
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 4.5 418670.86423890106 59810.58616606319 1540989111.4372597
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 6.5 421946.41291685525  66917.6601700391  3151248756.601821

this branch number_GH=4

       m-bh         a-bh c-dh   f-dh  q-stars p-stars u-stars  ml        chi2             kinchi2          kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 5.5 424493.15420013946 65666.78023530384 113735.67911982398
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 4.5  429197.1525017018 67539.82992860806  87613.20968621966
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 6.5 429339.91589900514 71949.22739492588 164348.79271744133

this branch number_GH=5

       m-bh         a-bh c-dh   f-dh  q-stars p-stars u-stars  ml        chi2             kinchi2          kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 5.5  424501.6272541011 65672.14007511585 113730.16423063162
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 4.5 429210.03923869494 67547.53606718627  87602.64352996505
3162277.6601683795 0.001 10.0 1.25893     0.2     0.7  0.9999 6.5   429346.537336573 71953.90520141224 164345.39016516917
prashjet commented 11 months ago

Thanks Thomas. I think this all addresses the number_GH question satisfactorily. I'd say let's open three issues for the points you raise above:

An exception is the case with number_GH=3 and the Python NNLS where the kinmapchi2 values are orders of magnitude larger, don't know what happens here... Similarly huge kinmapchi2 values occur when apply_systematic_error=True is set when calculating kinmapchi2...

While the GaussHermite.get_data(...) method is used throughout DYNAMITE to access GH kinematics data, DYNAMITE still accesses BayesLOSVD data via the class attribute .data in multiple places. Shall we change that to the getter method now or keep it for later?

Plotting kinematic maps fails if number_GH<4 and plots only four coefficients' values if number_GH>4. I'd suggest we deal with this when we fix kinematic map plotting for arbitrary numbers of GH coefficients.

... where I think the first point is more urgent, while the second two we can address as and when they becomes necessary. Worth making the issues now (and referencing this discussion) so we don't forget.

Otherwise, I will work on the docs and then we can merge this.