big new scenario problems

@fvilla, @lambdatronic

OK, here goes... problems with the San Pedro water & carbon scenarios, but let's stick with carbon for instructional purposes. First, pull my latest version from github. Then, run baseline & scenarios:

model -o base.nc core.models.carbon-san-pedro/identification-carbon core.contexts.beta/san_pedro_us512 model -o open.nc -s core.models.carbon-san-pedro/open-development-carbon core.models.carbon-san-pedro/identification-carbon core.contexts.beta/san_pedro_us512 model -o constrained.nc -s core.models.carbon-san-pedro/constrained-development-carbon core.models.carbon-san-pedro/identification-carbon core.contexts.beta/san_pedro_us512 model -o constrained2.nc -s core.models.carbon-san-pedro/constrained-development-carbon2 core.models.carbon-san-pedro/identification-carbon core.contexts.beta/san_pedro_us512 (this is a second constrained scenario that Gary suggested I set up to test the behavior of the :as statements)

Baseline results look beautiful. In the scenarios, you'll note that they're supposed to: 1) change percent vegetation cover to very low in all developed areas, 2) change fire frequency to no fire frequency in all developed areas, 3) in areas where there's an urban growth data layer (the Steinitz30Class open & constrained layers), swap those reclassified data for the original carbon vegetation type data; in areas where the Steinitz data don't have coverage, use the original carbon vegetation type layer. Ignore 4) greenhouse gas emissions- only of use in the flow models. Gary's looked at all the code and ensures me it looks good from our end.

Running those 4 models produces myriad problems. First, if you compare baseline and constrained scenario source results you see no differences (if you look at the BNs, the carbon vegetation types and % vegetation cover should indeed change, propagating change through to the results). If you look at the sink results, you see differences. BUT if you subtract constrained sink from baseline sink, you see differences everywhere - even beyond the Steinitz layers (you're not supposed to see any change outside of those; you can see their boundary on the geoserver layer preview or elsewhere - but they don't extend across the whole spatial context, as only that area has relevant urban growth modeling data). That's one source of weirdness.

Next, look at the percent vegetation cover data between the baseline, constrained, and constrained2. You should see no difference between constrained and baseline (bad - there should be some changes here in developed areas; overlay the urban land (class 7) in "carbon vegetation type" in the constrained scenario to see places where this should change but doesn't. BUT if you look at constrained2 vs. constrained, there is a change! - and the only difference in the code was eliminating :as statements in constrained2, weird, huh?

Next, look at soil & vegetation carbon storage in either constrained output, and compare to baseline. You'll see no change in veg carbon storage (bad, changes to carbon-veg-type and percent-vegetation-cover should cause changes in the outputs) - but a big and weird change in soil carbon storage - changes appear to have been made within the boundary of the Steinitz dataset, but outside that area it looks like the data were ignored, and the results have changed greatly versus the baseline.

So a ton of unpredictable behavior is going on here. To summarize, there are inconsistencies between the constrained and constrained2 outputs, meaning the :as statements are behaving unpredictably. There may be problems with reusing carbon vegetation type data in the scenarios, which would be bad news, because way the models are written is the behavior we want out of the system.

Enlightenment appreciated as always...

OK, I don't know if this fixes all the problems, but I committed a small fix to the scenario definition that does make the results look very different. The problem is in the perverse dependencies of this definition:

(model sanPedro:CarbonVegetationType (classification sanPedro:CarbonVegetationType :context [vegetation-type-open :as vto vegetation-type :as vt] :state #(if (no-data? (:vto %)) (:vt %) (:vto %))))

now as we all know, the :as are required to ensure that the match between names and models survives across application to arbitrary models (so there is no need for the constrained2 thing, which is just wrong, and would do no good to our self-esteem even if it was, by chance, producing what you want). BUT - as you know, the concepts are the actual identity of the data, and the alternative vegetation type model that you have put there contains TWO different dependencies with the SAME concept! I did make it possible to substitute a model of X with another that contains X as a dependency, not because it's right but because it is the only intuitive way to do it (the right way, in the next language, is to call it a transformation but currently transformations cannot have dependencies). But putting two different models of a concept as dependencies in a model of that same concept is a bit too much, really.

So: as you can see the model hasn't changed, but I did create a VegetationTypeOpen concept (in the namespace-ontology at the top of the file so the ontologies don't need to change) and used that for the vegetation-type-open model. This removes the ambiguity. Please have a look and make sure that you don't have the same kind of logics in other models, because that's guaranteed to screw things up. One thing I need to address is semantic validation of models before they are run, which would catch conditions like this, but it would also catch many things we currently expect to work and it's a big job.

This is closer but we're still not getting consistent results. Pull the new file from github, which incorporates the same change into the constrained development scenario as you put into open. Then run the open-development-carbon and constrained-development-carbon scenarios mentioned above.

In each set of results, open the "open development" and "constrained development" layers in the .nc file, then remove the "2" value (this shows developed areas that should switch values according to the scenario commands). If you look at PercentVegetationCoverClass results for the open scenario, you'll see all values for percent veg cover as "1" where they overlap with development, but this is NOT the case for the constrained percent veg cover layer (you'll see places where constrained development overlaps with it but is not successfully transformed to a value of 1.

On to fire frequency - these values are not being transformed correctly now for either open or constrained development pixels. Problematic. The carbon vegetation type maps look correct now at least, which is good.

OK, that's the input data: on to source & sink model results. Source value results look pretty good: viewing them and turning the open development layer on and off, you can see the low carbon sequestration values on developed parts of the landscape. Things get strange quickly in the carbon sink model: in the open development scenario, soil carbon storage results seem to be (correctly) lower in developed areas, while veg carbon storage sees no change (!), and in the constrained development scenario, the exact opposite occurs (!!) Finally, because fire frequency is not correctly transformed in the scenarios, the sink outputs are incorrectly too high in developed areas.

So, the upshot - scenarios still somehow don't seem to be behaving consistently...

OK, new strategy which should put an end to this pain with no changes to the models.

Models and scenarios have been essentially bug-free for quite a while now, which is reassuring. But creating scenarios, particularly with this thing of transforming values for concepts into others for the same concepts leads to issues that are pervasive and bad, and before the new version there will be no way of using transformations with dependencies, which is the only right thing to do. Basically, during computation of a model (with or without scenarios) there is one slot per concept, and different value in the same point/time for the same concept cannot appear. If a concept appears more than once, a mess will result. I previously made it possible and ensured that the highest-level definition is the one that gets picked, but any complex dependency structure can still screw this up, and that's what happened here.

So what I did is to check each model after a scenario is applied, checking each observable from the top-level down, and turn any concepts that have already appeared but reappear as dependencies into new concepts named xxx:yyy__n, where n is an increasing number. Mind you: this ONLY works after a scenario is applied and is meant to ease the pain of having to worry about concept duplication. So for example a defmodel like

(defmodel fff FireFrequency (classification FireFrequency :context [constrained-development-scenario :as cd fire-frequency :as ff] :state #(if (is? (:cd %) (conc 'sanPedro:DevelopedConstrained)) (conc 'carbonService:NoFireFrequency) (:ff %))))

where FireFrequency appears both as the main observable and a dependency, and no scenario is being used, will still punish the modeler appropriately by producing wrong results. But if that model appears in a scenario the new code will turn the innermost FireFrequency into FireFrequency__1, which will also appear in the outputs. This should help debugging and allow simulating a transformation (that doesn't change semantics) using a model as it's done all the time in our scenarios, until I can implement "transform" instructions that will take dependencies and allow me to have such semantic inconsistencies produce errors.

In other words: MODELS THAT TRANSFORM THE RESULTS OF ANOTHER MODEL OF THE SAME CONCEPT ARE NOW SUPPORTED, BUT ONLY TO BE USED IN SCENARIOS unless alternative conceptualizations are used to prevent illogical dependency structures.

The __n suffix is removed from the concept when generating keys for values inside :state closures, so the usual code should keep working even when the concept has been modified and no :as clause is given in the models (but do give :as in scenarios as discussed).

Also remember that cases when you had not one but TWO or more deps of the same concept are still perverse and illogical, so they need to be fixed as discussed.

Results of the sanpedro models now look different than yesterday, even where they already looked different, but they all bring clear traces of the developed scenario so I assume that the results from today are the right ones anyway. Please check. Hopefully this concludes the problem and allows us to move on.

OK, I'm finally getting a look at this (Huginn was down for much of friday while Gary was working on the tape drive system). It looks like we're indeed getting the correct masking behavior from the scenarios now - wohoo! I feel like I've done a really poor job of explaining why this type of scenario might be useful in the future though, and why we might want to enable it even if for some reason that's still unclear to me it seems "impure" from a modeling perspective. Hopefully the new modeling language will support this type of thing - it may be worth a quick chat on the next call about why that should be the case.

Thanks, Ken

On Fri, Nov 4, 2011 at 5:41 AM, Ferdinando Villa < reply@reply.github.com>wrote:

OK, new strategy which should put an end to this pain with no changes to the models.

Models and scenarios have been essentially bug-free for quite a while now, which is reassuring. But creating scenarios, particularly with this thing of transforming values for concepts into others for the same concepts is pervasive and bad, and before the new version there will be no way of using transformations with dependencies, which is the only right thing to do. Basically, during computation of a model (with or without scenarios) there is one slot per concept, and different value in the same point/time for the same concept cannot appear. If a concept appears more than once, a mess will result. I previously made it possible and ensured that the highest-level definition is the one that gets picked, but any complex dependency structure can still screw this up, and that's what happened here.

So what I did is to check each model after a scenario is applied, checking each observable from the top-level down, and turn any concepts that have already appeared but reappear as dependencies into new concepts named xxx:yyy__n, where n is an increasing number. Mind you: this ONLY works after a scenario is applied and is meant to ease the pain of having to worry about concept duplication. So for example a defmodel like

(defmodel fff FireFrequency (classification FireFrequency :context [constrained-development-scenario :as cd fire-frequency :as ff] :state #(if (is? (:cd %) (conc 'sanPedro:DevelopedConstrained)) (conc 'carbonService:NoFireFrequency) (:ff %))))

where FireFrequency appears both as the main observable and a dependency, and no scenario is being used, will still punish the model appropriately by producing wrong results. But if that model appears in a scenario the new code will turn the innermost FireFrequency into FireFrequency__1, which will also appear in the outputs. This should help debugging and allow simulating a transformation (that doesn't change semantics) using a model as it's done all the time in our scenarios, until I can implement "transform" instructions that will take dependencies and allow me to have such semantic inconsistencies produce errors.

The __n suffix is removed from the concept when generating keys for values inside :state closures, so the usual code should keep working even when the concept has been modified and no :as clause is given in the models (but do give :as in scenarios as discussed).

Also remember that cases when you had not one but TWO or more deps of the same concept are still perverse and illogical, so they need to be fixed as discussed.

Results of the sanpedro models now look different than yesterday, even where they already looked different, but they all bring clear traces of the developed scenario so I assume that the results from today are the right ones anyway. Please check. Hopefully this concludes the problem and allows us to move on.

Reply to this email directly or view it on GitHub: https://github.com/ariesteam/aries/issues/31#issuecomment-2629604

OK - so I can close this (you can too).

That type of scenario is an absolute necessity and there will be support for it - the problem is just that it now has to be implemented as a model when it is not, and that clashes with the (proper) semantic assumptions in thinklab. Those things are not models but transformations of models - basically you say "take the output of the original model and change it like this". Using a model is a contradiction: you first describe an apple, then you say that an apple is that apple, but different. All it takes is a transformation with dependencies, already available in the new language, which removes the problem but it's very hard to implement now without the trick I committed last week...

ariesteam / aries

big new scenario problems #31