Fallback value for atomic data requests outside of data range?

Mateasek commented 2 years ago

Currently there are two ways how requests of atomic data outside of data range are handled.

An error is returned when the interpolator is requested to extrapolate
Interpolator extrapolates

Yesterday I discovered that in some cases (In my case SOLPS simulation) it would be useful if there was a third option, which would give user a chance to set a fallback value. In my case that would be zero. What I encountered is, that the PEC rate I was using was provided up to Te=10 eV, but I could clearly see radiation from regions with Te > 15. In this case observed spectra will have questionable accuracy. This can be solved by modifying density of the radiating species, but wouldn't be the possibility to solve it at the atomic data side better?

vsnever commented 2 years ago

Hi @Mateasek, This will be a useful option. I also agree that this should be implemented at the atomic data level.

Each atomic rate must have a dedicated fallback value, and all of these fallback values must be provided when AtomicData is initialised. I think it would be clearer to specify the fallback values as separate arguments, namely:

class OpenADAS(AtomicData):
    ...
    def __init__(self, data_path=None, permit_extrapolation=False, missing_rates_return_null=False,
                 wavelength_element_fallback=False,
                 ionisation_rate_fallback_value=None,
                 recombination_rate_fallback_value=None,
                 thermal_cx_rate_fallback_value=None,
                 beam_cx_pec_fallback_value=None,
                 beam_stopping_rate_fallback_value=None,
                 beam_population_rate_fallback_value=None,
                 beam_emission_pec_fallback_value=None,
                 impact_excitation_pec_fallback_value=None,
                 recombination_pec_fallback_value=None,
                 line_radiated_power_rate_fallback_value=None,
                 continuum_radiated_power_rate_fallback_value=None,
                 cx_radiated_power_rate_fallback_value=None):

It's unlikely that the user will need to specify more than a couple of these values in a single simulation, so I don't think we should worry about the number of arguments.

Also, I think the fallback should take precedence over extrapolation. Namely, if extrapolation is enabled, but fallback values are set for some atomic rates, then the fallback value should be used for these atomic rates instead of extrapolation, and all other rates will be extrapolated.

Mateasek commented 2 years ago

Yes, this is one option. We could also add attributes with default values, to limit the number of parameters. As you said, users will rarely use fallback for many rates at once.

About the implementation, I thought I would add a set of functions called OutofRangeFallbackxD, they would work in a similar way as the clamp function Cherab imports from Raysect, but instead of returning the min/max value, it would return the fallback value. These would then be used to wrap rate interpolators when the fallback is on.

CnlPepper commented 2 years ago

Hey chaps, I'm not sure this is really the best approach to solving this problem. The issue here is the data doesn't extend to the ranges the simulation is being applied to. This should really be fixed at the data end, not by kludging in options on the atomic data object. Especially when that kludge affects all the rates of one class!

The solution here should be to address the source data, not the code. The approach here would be to add zeros (or whatever value) to the upper ranges of rate... which is effectively what this kludge does. This would be best implemented in the rate install methods if anywhere.

btw the reason we added the extrapolation at all was for very short range extrapolations, mainly for the lower range of the rates (where they tend to head to zero), not extending the range higher.

CnlPepper commented 2 years ago

Yes, this is one option. We could also add attributes with default values, to limit the number of parameters. As you said, users will rarely use fallback for many rates at once.

About the implementation, I thought I would add a set of functions called OutofRangeFallbackxD, they would work in a similar way as the clamp function Cherab imports from Raysect, but instead of returning the min/max value, it would return the fallback value. These would then be used to wrap rate interpolators when the fallback is on.

FallbackxD would be a bit more concise. Essentially it calls function A, if that throws an exception, it calls function B.

CnlPepper commented 2 years ago

Kludging core code rather than fixing the data is in general a really bad approach and you need to not do it or you'll end up with an unstable, unmaintainable nightmare. Keep the data handling and repair at the outer most shell, where the data is injected into the code, not where it is used.

Mateasek commented 2 years ago

I don't think changing data is the better option here. The atomic data were generated as a "consistent" table and if we extend it with zeros, then the interpolators won't correctly return errors when asked to extrapolate without permission. This will lead to even larger nightmare.

What I did is in PR #368 and #369. Nothing in the core.atomic was touched. All the falback options and handling was added to the OpenADAS part. The OutofRangeFallbackxD was added to the core.

CnlPepper commented 2 years ago

If the data is at fault it is the data that should be fixed. Adding endless kludges to the code to address data not being available is not scalable. If I were to redesign cherabs atomic data system I would entirely remove the extrapolation as a global setting too. I'd move it to an option at the point of rate install into the cherab repo - basically adding a flag to the JSON representation that says, yes this rate is safe to extrapolate. This would then allow per rate decisions.

For now simply adding a few zeros on to the end of the rate will prevent the extrapolation from returning anything but zero above the last point. It is a kludge too, but one that is local to one set of data that is known to extrapolate badly.

CnlPepper commented 2 years ago

Openadas is essentially "core"too given it is used by practically every user. Core code here means code with a wide ranging impact on users (if you introduce a bug/logic change).

vsnever commented 2 years ago

Just realised that, for example, a user can have different impact excitation PECs in the same simulation, and if the fallback value is given at the AtomicData level, it applies to all of them. But what if different PECs from this simulation need different fallback values? At the same time, the same data needs the same fallback values in different simulations, so manually reinstalling the faulty data on the user side may indeed be a better option.

Mateasek commented 2 years ago

Sorry but I just don't see this as a better option.

What if an interpolator is asked to provide value for input between the last value original value domain and the first added zero? This will be a clearly wrong value. Yes, this can be limited by putting the extended values close to the last number of the data domain, but then can't this cause problems with interpolators oscilating and overshooting, as we saw with the new interpolators before @vsnever changed the interpolators to work in log scale? And again returning clearly wrong values?

Applying fallback at the atomic data level is imo fine, because you can repeat the observation with different fallback values.

I agree with @CnlPepper, that the extrapolation option is not good, but the fallback value is giving users a chance, to use the data inside the original data domain and to control what they get outside of it (for example nothing, if you use 0) . I can't really imagine users having to generate atomic data repositories for single simulations, which is what you suggest, because the way you add the zeros will definitely influence your results.

Mateasek commented 2 years ago

If adding fallback option is not the way to go, then we have to come up with a new one, but I don't think that it is the option to extend data domain.

vsnever commented 2 years ago

Applying fallback at the atomic data level is imo fine, because you can repeat the observation with different fallback values.

I'm not against the fallback values, but to provide each dataset with a dedicated fallback value, the AtomicData should accept not just single values but also dictionaries, e.g.:

ionisation_rate_fallback_value={(carbon, 1): value1, (hydrogen, 0): value2}
beam_cx_pec_fallback_value={(hydrogen, carbon, 6, (4, 3)): value3}

What if an interpolator is asked to provide value for input between the last value original value domain and the first added zero? This will be a clearly wrong value. Yes, this can be limited by putting the extended values close to the last number of the data domain, but then can't this cause problems with interpolators oscilating and overshooting, as we saw with the new interpolators before @vsnever changed the interpolators to work in log scale? And again returning clearly wrong values?

I think we can avoid implementing any universal solution here. The user always can get the data with repository.get*, modify it to solve the specific problem with this particular data and put the modified data back with repository.update*. Thus, if the user needs to extend the domain, this can be done in a way appropriate to the specific data using the repository methods that already exist.

Mateasek commented 2 years ago

My main motivation for fallback was to avoid extrapolation of atomic data in simulations, while still being able to perform them. I didn't want to introduce any complicated options and a tool giving users the possibility to "extrapolate" atomic data in a different way. Cherab defaults to zero contribution whenever an input value does not make sense (density, temperature, ...). I wanted to add this option to atomic data too. I would be completely happy with having a zero_fallback as a True/False option which would be returning zero rates if turned on, instead of having to extrapolate, which in my opinion is worse.

Modifying data is as bad idea from the point of view of data as adding options to the OpenADAS class, from the programming point of view, in my opinion.

CnlPepper commented 2 years ago

I entirely understand you motivation, but it is a not an appropriate solution. It is not general, it is a specific patch for a specific scenario.

There are two elements to my reply: 1) is a recommendation for a better architecture and 2) a temporary solution to quickly address the current issue without introducing a modification to the codebase.

Addressing 2) quickly: Temporarily tacking on zeros to the rate data set to cover the range you need is entirely fine as it is precisely what the fall back option does. So you have essentially already chosen to "modify" the rate (it is entirely irrelevant how you do it to the outcome of the simulation). Same way extrapolation is an explicit choice as to how to "modify" the data outside of its explicitly defined domain.

Let's take a step back. The point of the cherab atomic data object having the ability to access multiple repositories is exactly so you can have different sets of atomic data - so you can customise them for a simulation. In an ideal world each script would start with the construction of an atomic data repository to ensure every simulation has clear provenance of rate data, and decisions made how to interpret that data. It's only stored on disk as a cache/convenience.

The correct point to introduce algorithms that handle rate data is at the point of generating the atomic data repo (which is really the atomic data configuration for the script). At that point the handling of each and every rate can be customised. Extrapolation or fallback values can be addressed on a per rate basis. All manipulation of the rate handling is listed in a single place in the script.

Moving extrapolation and fallback to the rate install functions and storing that alongside the data in the rate repo is a relatively minor code change and should be done:

1) remove the extrapolate parameter from the Atomic data object. 2) add extrapolate and fallback parameters to the install methods in openadas 3) add a "processing" section to the JSON representation of each rate class in the cherab repo that records the choice of extrapolate/fallback 4) update the rate objects to read extrapolation / fallback state from JSON rather than passed from the atomic data object.

The atomic data is just another bit of configuration, if it wasn't entirely inefficient I'd have insisted that the atomic data object be populated separately in every script (having a cache risks the environment changing under a script and being invisible to users).

CnlPepper commented 2 years ago

My main motivation for fallback was to avoid extrapolation of atomic data in simulations, while still being able to perform them. I didn't want to introduce any complicated options and a tool giving users the possibility to "extrapolate" atomic data in a different way. Cherab defaults to zero contribution whenever an input value does not make sense (density, temperature, ...). I wanted to add this option to atomic data too. I would be completely happy with having a zero_fallback as a True/False option which would be returning zero rates if turned on, instead of having to extrapolate, which in my opinion is worse.

Modifying data is as bad idea from the point of view of data as adding options to the OpenADAS class, from the programming point of view, in my opinion.

Adding a global clamp to zero option for all rates at the atomic data object level, as an alternative to global extrapolation, would be significantly better than the current proposal.

There should be no precidence order. If both extrapolation and clamp are enabled, an exception should be raised.

Ultimately the project should make the endeavour to move these options to a per rate basis rather than global.

Mateasek commented 2 years ago

In an ideal world each script would start with the construction of an atomic data repository to ensure every simulation has clear provenance of rate data, and decisions made how to interpret that data. It's only stored on disk as a cache/convenience.

Thanks for explanation @CnlPepper . I didn't understand and see the repository being only a "database" cache being used only for convenience and that the best option would be not using the repository at all and building everything before every run.

remove the extrapolate parameter from the Atomic data object.

add extrapolate and fallback parameters to the install methods in openadas

add a "processing" section to the JSON representation of each rate class in the cherab repo that records the choice of extrapolate/fallback

update the rate objects to read extrapolation / fallback state from JSON rather than passed from the atomic data object.

I will follow this guideline. Just to be sure to avoid any more redundant coding (from my side). The rate objects will use:

interpolators, throwing errors when asked to extrapolate
interpolators, extrapolating given the parameters (nearest, linear, etc.)
interpolators wrapped in the OutofRangexD wrappers (the fallback option)

I have one question. At the end, fallback and extrapolation handle the same event (returning value outside of the data domain). Wouldn't it be better to have a parameter, which will decide between the three possible options, instead of checking if only one option is allowed? Something like: extrapolation: 'False', extrapolation: 'True', extrapolation: 'fallback'? and then haviing additional parameters, like extrapolation_type and fallback_value, specifying the option further?

CnlPepper commented 2 years ago

The three code paths look appropriate to me. Though I think the third option should be to simply return zero out of range. A fallback value that covers all rate data is not viable as it will only ever be sensible for a small subset of the data.

Regarding the atomic data interface, it would probably be cleaner to adjust how the extrapolate parameter is interpreted. This would have to be well documented as it would alter a well used interface (cherab demos would probably need updating too).

I'd keep the label "extrapolation", default it to None. Going forward it takes a string or None:

1) None: no extrapolation 2) "polynomial": enables the rate extrapolation calculations for out of range data. 3) "zero": Rates return zero when out of range data are requested.

cherab / core

Fallback value for atomic data requests outside of data range? #367