Fix training so that it doesn't automatically look for service:Concept and can look for locally specific input or output data

kbagstad commented 12 years ago

@fvilla

Per my email 1/24/12

fvilla commented 12 years ago

Not sure without a precise example, but if the model requires apples, I can only feed it evidence in apples, not in oranges. The evidence must be exactly in the same form (classifications etc) - so if we have data for habitat:SoilInfiltrationClass and we're operating on a namespace that uses the waterSupplyService ontology, so that the concept for SoilInfiltrationClass in a BN ends up being waterSupplyService:SoilInfiltrationClass, the system cannot know whether the concept is represented in the same way (e.g. discretization) in the two ontologies or not. So I cannot make it lookup anything else, because the fact of using concepts in different ontologies means that something must be different; if that's because we're just using a namespace and the BN gets the concepts from it, then it's still not right to connect to another ontology, because there is no guarantee that the concept represented in one classification will be the same as in another. And even if it was possible, the machinery to tell the system which ontology to use would be very cumbersome and I'm 100% sure you wouldn't like it.

So the solution is, if you know you want to use habitat:SoilInfiltrationClass as waterSupplyService:SoilInfiltrationClass, do the right thing and just SAY it. Which you do by creating a model in the same namespace you will be training in for waterSupplyService:SoilInfiltrationClass that uses data for habitat:SoilInfiltrationClass. The system will find it and use it.

Is there any other issue that this action (not a fix, simply the right thing to do) cannot fix?

kbagstad commented 12 years ago

The issue is that for different models, we discretized the same concept differently. As in, the San Pedro and Colorado models used 3 discrete states for infiltration while the La Antigua and Tanzania ones used 5. They thus used local ontology concepts for essentially the same thing (e.g., sanPedro:SoilInfilrationClass), which doesn't get looked up and used properly by the training system. It was always my understanding that if you used a different number of discrete states for essentially the same thing in model runs in different parts of the world, you created a local ontology instance of it with the different discretization rather than just creating a 5-class discretization but only using 3 classes (which we could do but it was my understanding that this was incorrect)

There's a second example here - we agreed on a call a few weeks ago to use sanPedro:CarbonVegetationType, southernCalifornia:CarbonVegetationType, sanPedro:EvapotranspirationVegetationType, colorado:EvapotranspirationVegetationType, etc. since discrete state names differ quite a bit between models while the broader concepts remain the same. This seemed to me like a relatively elegant solution, but the system is only picking up (and training on) carbonService:CarbonVegetationType, waterSupplyService:EvapotranspirationVegetationType, etc. - so important evidence is being ignored.

Thoughts? Ken

On Fri, Jan 27, 2012 at 7:47 AM, Ferdinando Villa reply@reply.github.com wrote:

Not sure without a precise example, but if the model requires apples, I can only feed it evidence in apples, not in oranges. The evidence must be exactly in the same form (classifications etc) - so if we have data for habitat:SoilInfiltrationClass and we're operating on a namespace so that the concept for SoilInfiltrationClass is waterSupplyService:SoilInfiltrationClass, the system cannot know whether the concept is represented in the same way (e.g. discretization) in the two ontologies or not. So I cannot make it lookup anything else, because the fact of using concepts in different ontologies means that something must be different; if that's because we're just using a namespace and the BN gets the concepts from it, then it's still not right to connect to another ontology, because there is no guarantee that the concept represented in one classification will be the same as in another. And even if it was possible, the machinery to tell the system which ontology to use would be very cumbersome and I'm 100% sure you wouldn't like it.

So the solution is, if you know you want to use habitat:SoilInfiltrationClass as waterSupplyService:SoilInfiltrationClass, do the right thing and just SAY it. Which you do by creating a model in the same namespace you will be training in for waterSupplyService:SoilInfiltrationClass that uses data for habitat:SoilInfiltrationClass. The system will find it and use it.

Is there any other issue that this action (not a fix, simply the right thing to do) cannot fix?

Reply to this email directly or view it on GitHub: https://github.com/ariesteam/aries/issues/51#issuecomment-3687664

kbagstad commented 12 years ago

OK, one more shot at a clearer explanation of what we need here - try:

train -id sanpedroinfiltration core.models.water-san-pedro/infiltration-sink core.contexts.beta/san_pedro_us160

The system won't recognize the data that's there for sanPedro:SoilInfiltrationClass because it's looking for waterSupplyService:SoilInfiltrationClass, which isn't used in the model (the two concepts differ because one uses 3 discrete states and one uses 5).

Similarly, try train -id sanpedrocsource core.models.carbon-san-pedro/source core.contexts.beta/san_pedro_us160

The system looks for, and doesn't find, carbonService:CarbonVegetationType, because in the model we use sanPedro:CarbonVegetationType (these are differentiated because service-relevant LULC categorization differs by spatial context). Thanks!

fvilla commented 12 years ago

Right. So: the bayesian network model contains nodes that are automatically matched to concepts. Because it is messy to add the ontology to those node names, it will require all concepts to exist in the same ontology of the main observable (the one you put after (bayesian ...). Using your first example, the ontology for the whole model namespace (set in the namespace-ontology form at the beginning) is waterSupplyService, so if you don't use concepts qualified with the ontology in the (bayesian) statement, that's where the concepts will be assumed to be.

If you want to produce and train results that are sanPedro-specific, you either make sanPedro the namespace ontology at the top (which at this stage may require lots of adaptation in other models) or you just use sanPedro for the bayesian statement: (defmodel infiltration-sink sanPedro:SoilInfiltration (bayesian sanPedro:SoilInfiltration :import "aries.core::SurfaceWaterSinkSanPedro.xdsl" :context [stream-channel mountain-front] :keep [sanPedro:SoilInfiltrationClass] :result infiltration))

This way, all the concepts in the BN are assumed to belong to sanPedro, and the training will look for models in the same namespace that can be used to observe evidence for them. Note that the model you had (before I pushed the fix I'm discussing here) was the previously called "undiscretizer" that looked for data tagged as sanPedro:... This obviously doesn't find anything because the data you have are measurement that are (correctly) tagged under waterSupplyService. Therefore this modified model is what you wanted:

(defmodel infiltration sanPedro:SoilInfiltrationClass (probabilistic-measurement waterSupplyService:SoilInfiltrationClass "mm" [60 120] sanPedro:HighInfiltration [10 60] sanPedro:ModerateInfiltration [ 0 10] sanPedro:LowInfiltration))

Changing these two allow training and works just fine. Repeat the same reasoning for all other similar situations.

fvilla commented 12 years ago

Note post-close: the "fix" I discuss is just a rehash of my first comment; i.e. do it right and it works. There's nothing to fix except getting the model semantics to make sense. If you're asking for waterSupplyService:SoilInfiltrationClass the system will look for that. If you're asking for sanPedro:SoilInfiltrationClass it will look for that instead. Just ask for what you know how to compute (i.e. your namespace contains models for) and it will be used. That's it.

kbagstad commented 12 years ago

OK... I've been going through all the BNs that will undergo training and have a single case study-specific concept and changing all the concepts in those models to case study specific ones. As you can imagine this is a lot of refactoring ontologies, models, storylines, and the colormaps.properties file but no problem if it's the right thing to do. However, I've got another strange case that may take a a bit of thought. In many cases there was actually a good reason why some concepts remained service-specific and others remained case study specific, even within the same model (though evidently this means evidence cannot be considered for training). The best example are the Colorado water supply and sediment regulation models. In all our water supply models, we discretize percent tree canopy cover into 5 classes while we discretize it into 3 classes in the sediment regulation models (there are good reasons for this, based on the dynamics of these two services). So in the past I'd used a 3-class concept soilRetentionService:PercentTreeCanopyCover and a 5-class concept, waterSupplyService:PercentTreeCanopyCover. Now you're telling me that if we train both models to Colorado and they have some CO concepts in them, they all need to be switched to colorado concepts. But of course that means that the sediment model now uses only 3 of the 5 classes of percent tree canopy cover, which it was my understanding was a bad thing (i.e., there's some semantic ambiguity here). Would welcome your thoughts on how to do this right, so we get it right in any other, similar cases.

Reopening until I hear back on the right thing to do, then we can close this one for good.

ariesteam / aries

Fix training so that it doesn't automatically look for service:Concept and can look for locally specific input or output data #51