Closed maindlt closed 11 months ago
One thing to check before merging: it should be possible for the number_GH
from the config file to be bigger than the max_gh_order
as inferred from the kinematics file.
Hi @prashjet,
to ensure both ways work (number_GH from config <, =, > number of gh components in the data files), DYNAMITE now compares the two after reading both the kinematics files and the config.weight_solver_settings
. This happens in Configuration.validate()
If the numbers of gh coefficients differ, the kinematics.data
astropy table is adapted: either all-zero columns are added or columns are removed (method GaussHermite.adjust_gh_data_to_coefficient_number()
). This should have minimal side effects on the code. Does this make sense?
Tested with (the usual) test_nnls.py with number_GH set to 3, 4, and 5.
Cheers, Thomas
Hey Thomas. Regarding my last comment,
One thing to check before merging: it should be possible for the number_GH from the config file to be bigger than the max_gh_order as inferred from the kinematics file.
the point I wanted to make was that this functionality was already in-place. This was always the desired behaviour. I just wanted to check that this PR wouldn't break this existing behaviour. I think your new method GaussHermite.adjust_gh_data_to_coefficient_number
is a correct way to implement the same behaviour when it comes to weight solving, but I'm worried about unexpected consequences elsewhere.
Could you run a few more tests to rule out unexpected consequences? Run tutorial notebook #3 in the scenarios:
Hi Prash,
hm, I am afraid the (number_GH) > (number in data file) functionality broke already in one of the earlier PRs:
When I run 3_model_iterations_and_plots.ipynb
in the present (Oct 24, 2023) master branch, I get an error and a crash when setting number_GH=5 (number in data file is 4). It seems to have to do with implementing chi2_kinmap, see the error message below. Both the legacy and scipy weight solvers behave the same.
Running the same tutorial in this branch with number_GH=4 and number_GH=5 succeeds for both legacy and scipy weight solvers (except for the beta plots which only work with the legacy weight solver). The attached plots_compare.zip
has the output plots of the tutorial notebook 3 for number_GH 4 and 5 and both weight solvers. They look the same: plots_compare.zip
For me it is not 100% clear whether it is ok to add hx=dhx=0 columns for x=(number in data file+1)...number_GH, but what else to assume if (number_GH) > (number in data file)?
I'll be happy to discuss :-)
Cheers, Thomas
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/maindl/.local/lib/python3.9/site-packages/multiprocess-0.70.14-py3.9.egg/multiprocess/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/Users/maindl/.local/lib/python3.9/site-packages/multiprocess-0.70.14-py3.9.egg/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/Users/maindl/.local/lib/python3.9/site-packages/dynamite-4.0.0-py3.9.egg/dynamite/model_iterator.py", line 437, in create_and_run_model
weight_solver = mod.get_weights(orblib)
File "/Users/maindl/.local/lib/python3.9/site-packages/dynamite-4.0.0-py3.9.egg/dynamite/model.py", line 1014, in get_weights
weights, chi2_tot, chi2_kin, chi2_kinmap = weight_solver.solve(orblib)
File "/Users/maindl/.local/lib/python3.9/site-packages/dynamite-4.0.0-py3.9.egg/dynamite/weight_solvers.py", line 318, in solve
chi2_kinmap = self.chi2_kinmap(weights)
File "/Users/maindl/.local/lib/python3.9/site-packages/dynamite-4.0.0-py3.9.egg/dynamite/weight_solvers.py", line 112, in chi2_kinmap
obs_val = np.array(kinematics_data[coef])
File "/Users/maindl/.local/lib/python3.9/site-packages/astropy-5.3.1-py3.9-macosx-10.9-x86_64.egg/astropy/table/table.py", line 2055, in __getitem__
return self.columns[item]
File "/Users/maindl/.local/lib/python3.9/site-packages/astropy-5.3.1-py3.9-macosx-10.9-x86_64.egg/astropy/table/table.py", line 264, in __getitem__
return OrderedDict.__getitem__(self, item)
KeyError: 'h5'
"""
hm, I am afraid the (number_GH) > (number in data file) functionality broke already in one of the earlier PRs:
I see! OK, well, so maybe it's the perfect time to fix it :)
For me it is not 100% clear whether it is ok to add hx=dhx=0 columns for x=(number in data file+1)...number_GH, but what else to assume if (number_GH) > (number in data file)?
For (number_GH) > (number in data file), we do want hx=0
, but the most sensible choice for dhx
would be to set it to the systematic error specified in the configuration file, i.e. under
weight_solver_settings --> GH_sys_err
which we define in the docs as
a string of length 2 + number_GH floats, the systematic error applied to V, sigma, h3, …, hN
One complication, however, is that these systematic errors are already being used internally e.g. here, with a similar thing happening in all fortran weight solving scripts. If instead, at initialisation we added the systematic errors to the data table, then we would have to adjust elsewhere so that the systematic errors are not added twice.
I do think that having the systematic errors added to the data table at initialisation makes sense. It seems like the most streamlined was to take them into account downstream e.g. for plotting.
Hi Prash,
great, many thanks for your comments! I'll try to have it done (add systematic errors at initialization and take care of the specialties in calling the different weight solvers) before our Friday meeting, most probably tomorrow.
A nice side effect of this whole thing is that because the kinematics coefficient numbers are checked upon initialization we won't need to check for equal numbers of gh coefs (in case of multiple kinsets) in other parts of DYNAMITE.
I do think that having the systematic errors added to the data table at initialisation makes sense. It seems like the most streamlined was to take them into account downstream e.g. for plotting.
To clarify: right now plotting the kinematic maps uses the dhx
columns as they are in the kinematics file (i.e., without adding the systematic error). When we add those errors at initialization they need to be subtracted when plotting (so the kin maps look the same as they do now), right?
Many thanks and cheers, Thomas
Hi Prash,
looking at the uses of the Kinematics.data
object in the LegacyWeightSolver and Plotter, I think it is worthwhile to keep the 'raw' version (without the systematic errors). So I created a second Kinemtics.data_sys_err
Astropy table which has the systematic errors applied and also the dv
and dsigma
from vdMarel + Franx 93, which also goes into the scipy weight solver via the updated GaussHermite.get_observed_values_and_uncertainties()
. That table is created by the config reader after reading both the kinematics data and the weight solver settings. Right now it's in config's validate method (because that picks out all the individual kinematics already), but we can of course put it more prominently at the end of config's init method...
Cheers, Thomas
Hi Prash,
the last commit implements as discussed the following regarding Gauss Hermite kinematics:
GaussHermite.data
(previous behavior).GaussHermite.get_data(...)
returns the coefficients aligned with the weight solver setting number_GH
, the calculated data is cached and there is a parameter that determines whether the systematic errors are applied or not.kinmapchi2
considers number_GH
coefficients but never more coefficients than there are in the kinematics data file.GaussHermite.get_data(...)
and WeightSolver.chi2_kinmap()
help a bit with the docs... ;-)Tested with:
test_nnls.py
: same chi2
/kinchi2
/kinmapchi2
values as in the master branch for number_GH=3,4,5
. This is a bit weak because the kinematics input hash3=h4=0
.number_GH=3,4,5
, and both weight solvers (data below, matches well and exactly for number_GH=4
). An exception is the case with number_GH=3
and the Python NNLS where the kinmapchi2
values are orders of magnitude larger, don't know what happens here... Similarly huge kinmapchi2
values occur when apply_systematic_error=True
is set when calculating kinmapchi2
...4_BayesLOSVD.ipynb
to make sure BayesLOSVD still works (test passed).Issues:
GaussHermite.get_data(...)
method is used throughout DYNAMITE to access GH kinematics data, DYNAMITE still accesses BayesLOSVD data via the class attribute .data
in multiple places. Shall we change that to the getter method now or keep it for later?number_GH<4
and plots only four coefficients' values if number_GH>4
. I'd suggest we deal with this when we fix kinematic map plotting for arbitrary numbers of GH coefficients.Cheers, Thomas
FCC047 test runs, LegacyWeightSolver
master branch number_GH=4:
m-bh a-bh c-dh f-dh q-stars p-stars u-stars ml chi2 kinchi2 kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 5.5 424493.15420013765 65666.78023535715 113735.67888008912
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 4.5 429197.1525016983 67539.82992857283 87613.20983329182
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 6.5 429339.9158990041 71949.22739494297 164348.79223217713
this branch number_GH=3
m-bh a-bh c-dh f-dh q-stars p-stars u-stars ml chi2 kinchi2 kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ----------------- ------------------ ------------------
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 5.5 416425.2884538298 60359.941437665395 115153.38302447945
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 4.5 418670.8642388978 59810.58616600247 88574.86547229892
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 6.5 421946.4129168541 66917.66017005831 165940.24584393486
this branch number_GH=4
m-bh a-bh c-dh f-dh q-stars p-stars u-stars ml chi2 kinchi2 kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 5.5 424493.15420013765 65666.78023535715 113735.67888008912
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 4.5 429197.1525016983 67539.82992857283 87613.20983329182
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 6.5 429339.9158990041 71949.22739494297 164348.79223217713
this branch number_GH=5
m-bh a-bh c-dh f-dh q-stars p-stars u-stars ml chi2 kinchi2 kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 5.5 424501.6272540993 65672.14007516083 113730.16386600249
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 4.5 429210.03923869145 67547.53606716642 87602.64374101332
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 6.5 429346.53733657196 71953.90520141821 164345.39048523517
FCC047 test runs, Python NNLS
master branch number_GH=4:
m-bh a-bh c-dh f-dh q-stars p-stars u-stars ml chi2 kinchi2 kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 5.5 424493.15420013946 65666.78023530384 113735.67911982398
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 4.5 429197.1525017018 67539.82992860806 87613.20968621966
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 6.5 429339.91589900514 71949.22739492588 164348.79271744133
this branch number_GH=3
m-bh a-bh c-dh f-dh q-stars p-stars u-stars ml chi2 kinchi2 kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 5.5 416425.2884538317 60359.94143765696 2091496226.6670666
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 4.5 418670.86423890106 59810.58616606319 1540989111.4372597
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 6.5 421946.41291685525 66917.6601700391 3151248756.601821
this branch number_GH=4
m-bh a-bh c-dh f-dh q-stars p-stars u-stars ml chi2 kinchi2 kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 5.5 424493.15420013946 65666.78023530384 113735.67911982398
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 4.5 429197.1525017018 67539.82992860806 87613.20968621966
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 6.5 429339.91589900514 71949.22739492588 164348.79271744133
this branch number_GH=5
m-bh a-bh c-dh f-dh q-stars p-stars u-stars ml chi2 kinchi2 kinmapchi2
------------------ ----- ---- ------- ------- ------- ------- --- ------------------ ----------------- ------------------
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 5.5 424501.6272541011 65672.14007511585 113730.16423063162
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 4.5 429210.03923869494 67547.53606718627 87602.64352996505
3162277.6601683795 0.001 10.0 1.25893 0.2 0.7 0.9999 6.5 429346.537336573 71953.90520141224 164345.39016516917
Thanks Thomas. I think this all addresses the number_GH
question satisfactorily. I'd say let's open three issues for the points you raise above:
An exception is the case with number_GH=3 and the Python NNLS where the kinmapchi2 values are orders of magnitude larger, don't know what happens here... Similarly huge kinmapchi2 values occur when apply_systematic_error=True is set when calculating kinmapchi2...
While the GaussHermite.get_data(...) method is used throughout DYNAMITE to access GH kinematics data, DYNAMITE still accesses BayesLOSVD data via the class attribute .data in multiple places. Shall we change that to the getter method now or keep it for later?
Plotting kinematic maps fails if number_GH<4 and plots only four coefficients' values if number_GH>4. I'd suggest we deal with this when we fix kinematic map plotting for arbitrary numbers of GH coefficients.
... where I think the first point is more urgent, while the second two we can address as and when they becomes necessary. Worth making the issues now (and referencing this discussion) so we don't forget.
Otherwise, I will work on the docs and then we can merge this.
Bugfix for: DYNAMITE crashes when using the Python nnls weight solver and the GH kinematics file had a larger number of coefficients than indicated in the config file (in
GaussHermite.get_observed_values_and_uncertainties()
allocation was based on the config file's number, two loops were based on the kinematics file column count).To discuss: In
kinematics.py
, the attributeGaussHermite.max_gh_order
is calculated based on the number of columns ingauss_hermite_kins.ecsv
. Wherever the maximum GH order is referenced in the code though, the valuenumber_GH
from the config file is used. Shall we perhaps remove the attributeGaussHermite.max_gh_order
to reduce confusion?Tested with:
test_nnls.py
(#GH coefficients ingauss_hermite_kins.ecsv
= config file'snumber_GH
= 4)/home/fs71474/tmaindl/iris20230928/run_dyna_small.py
(#GH coefficients ingauss_hermite_kins.ecsv
= 6, config file'snumber_GH
= 4). Note that the second test crashes after weight solving becausePlotter.beta_plot()
only works with theLegacyWeightSolver
(expects the filenn_intrinsic_moments.out
to exist). This does not affect the validity of this PR though ;-)Please have a look at the code changes (only two lines) and merge if satisfied ;-)