ariesteam / aries

http://www.ariesonline.org
GNU General Public License v3.0
6 stars 1 forks source link

possible to reduce complexity of netCDF output concepts? #24

Closed kbagstad closed 12 years ago

kbagstad commented 12 years ago

@fvilla

kbagstad commented 12 years ago

When I was originally working with netCDF files, the variables contained in each file would just contain each relevant concept from the ontology (e.g., "AestheticViewProvision" when running the viewshed model). At some point (mid-July or later), they started reporting the ontology namespace and concept (aestheticService_AestheticViewProvision). This is annoying in drop-down menus where sometimes the end of the concept gets cut off and it becomes harder to work with. If it's possible to restore to earlier conditions this would be beneficial. For an example, see the files /raid/nc_outputs/viewbase929.nc (a newer file with the longer netCDF variable names) versus /raid/nc_outputs/SPViewBase.nc (with the older, cleaner variable names).

fvilla commented 12 years ago

yup, this kind of things don't happen magically, it was a choice on my part due to many models having results that have the same concept name in different ontologies. Because variable names must be unique, the only way to have both would be to add the ontology. The duplication of concept names in different ontologies is the correct idiom and shouldn't be changed- basically it means the interpretation of the same concept in a different concept space, which is right.

So I don't think that should be changed as it's difficult to establish which concept should have the priority when there's a choice, and the final context serialized to the NetCDF should be complete. What I can do is to name the variables with the concept name first, so you see the most relevant information first. That's easy. Otherwise we need to invent a naming convention but it's messier and requires preprocessing of the context to be implemented. Let me know what you prefer.

kbagstad commented 12 years ago

Good deal. Concept name first would be easier. Do you have any sense of how many examples are there of multiple concept names in different ontologies within the same model? If might be easier just to clean those up so there's truly only one of each concept within the same model.

On Fri, Sep 30, 2011 at 11:37 PM, Ferdinando Villa < reply@reply.github.com>wrote:

yup, this kind of things don't happen magically, it was a choice on my part due to many models having results that have the same concept name in different ontologies. Because variable names must be unique, the only way to have both would be to add the ontology. The duplication of concept names in different ontologies is the correct idiom and shouldn't be changed- basically it means the interpretation of the same concept in a different concept space, which is right.

So I don't think that should be changed as it's difficult to establish which concept should have the priority when there's a choice, and the final context serialized to the NetCDF should be complete. What I can do is to name the variables with the concept name first, so you see the most relevant information first. That's easy. Otherwise we need to invent a naming convention but it's messier and requires preprocessing of the context to be implemented. Let me know what you prefer.

Reply to this email directly or view it on GitHub: https://github.com/ariesteam/aries/issues/24#issuecomment-2255865

fvilla commented 12 years ago

Ehm, no, that wouldn't be a fix but it would place the burden of uniqueness on the modeler, leaving potential for ambiguity and data loss. I changed the order to concept_ontology which remains unambiguous and hopefully should make things clearer.

kbagstad commented 12 years ago

Thanks for that fix. After talking over outputs with my students and USGS colleagues, I think we still have some thought to put in about how to make the netCDF output list sensible. First, do we intend for the "average" user to open up the list and view it in a GIS program (most likely arc)? If so it's just a jumbled mess of inputs & outputs - no order (alphabetically, or logically, i.e., as input data vs. BN model outputs vs. flow model outputs). I'm still not convinced that we should be using the concept from the ontology rather than the concept that directly follows the defmodel statement (i.e., if it reads "defmodel source AestheticProximityProvision", it makes sense to me to have it read "source" rather than "AestheticProximityProvision_AestheticService" in the drop-down menu when reading netCDF outputs. And what about in cases where we have 2 input layers for the same concept - is it your intent that users see the raw input data or the reclassified data as it enters the model (I'm assuming #2 here). If it is #2, then I think that clears up some of your concern about repeated concepts in the netCDF output list.

Second, on the storylines, what do you think of the idea of using the "group" label to read "Input data to source models," "Output of source models," "Output from flow models", etc, and having them appear right below the layer name? I think this kind of organization will strongly help users grasp what input and output data are where, and why.

Reopening the issue until we can have a conversation that moves us forward - possibly involving the rest of the team or at least Brian, who's closer to the user end of things as I am...

fvilla commented 12 years ago

@lambdatronic @bvoigt

...which is what you do when you want a conversation with the rest of the team.

As far as I'm concerned, the notion that the "NetCDF output list" is something that depends on the NetCDF and nothing else is wrong. The nc file is just a container for data, and whatever output list comes from it depends on the program you use to open it. It is possible that the order of creation of the variables in the file is kept, but spending any time in beautifying the list of names in what is not meant to have any GUI implication is silly to the point of immorality.

This said, the input/output sorting you have in mind is much less easy than you think. Not all variables can be classified unambiguously according to how they enter the models; many enter more than one model. The thing that could be done is a topological sorting of the tree of dependencies and storing according to that (look it up). That would be, again, overkill for almost no purpose.

The group labels are exactly intended for what you mention - any logical organization is acceptable and is completely yours to decide. The only thing it will be used for is to sort the output list in the GUI (the idea is to have differently shaded backgrounds and a label for each class). Of course for readability the classes should be broad and few. This is of course another issue.

Waiting for comments I guess, I don't really know why I should leave this one open but here goes.

kbagstad commented 12 years ago

Fair enough re: spending time on the aesthetics of the .nc outputs. I think it's strongly likely that 90% of our users will open files in arc and that having a jumble of layer names will be confusing, but I understand your reluctance to spend time making the list prettier - which hopefully will become less relevant once the storyline explorer is fully enabled.

Re: group labels I'm thinking "source/sink/use/flow model inputs," "source/sink/use/flow model outputs" - small number of classes but would organize the inputs/outputs in a much clearer way, though I realize that some data do enter the models in >1 place. An older version of the storyline explorer had a checklist of "base data" and "service data" - while the names didn't make sense I think the organization was very clear and intuitive - something we should strive for. I'm happy to relabel these once the storyline files are complete.

Lastly I didn't see a response to my question about "I'm still not convinced that we should be using the concept from the ontology rather than the concept that directly follows the defmodel statement (i.e., if it reads "defmodel source AestheticProximityProvision", it makes sense to me to have it read "source" rather than "AestheticProximityProvision_AestheticService" in the drop-down menu when reading netCDF outputs. And what about in cases where we have 2 input layers for the same concept - is it your intent that users see the raw input data or the reclassified data as it enters the model (I'm assuming #2 here). If it is #2, then I think that clears up some of your concern about repeated concepts in the netCDF output list." - would like to see more discussion on this in particular.

Brian, Gary - thoughts? Far from a waste of time I think how easily 3rd parties can pick up interpretation of our data and results will be a big factor in whether they choose to use the system or not, and anything we can do to improve that is time well spent.

lambdatronic commented 12 years ago

Good points on both sides. I agree with Ken that any efforts we make towards improving the usability of this system are a plus. I see Ferd's point that Arc* is not his target output viewer. My question then is: what is the target output viewer, and how can we ensure its usability to future ARIES users. This is not so much a rhetorical question as a practical one. We are in a place now where we need to start training new people to not just understand ARIES outputs but to actually create, run, and debug ARIES models. We have this need, and we lack this capability. So let's get the expected user interface question resolved first, make it usable second, and train some new people third. That is all.

kbagstad commented 12 years ago

Another plea for readability in the netCDF outputs... I've worked through a number of new models today and they continue to be nightmarish to read and interpret. One noncontroversial fix: the right now the .nc outputs for uncertainty read "concept_ontologyUncertainty". Switching them to conceptUncertainty_ontology would be an easier read.

A more controversial fix: strongly reconsider getting rid of the ontology name. It makes things really cumbersome for the user (I suspect I'll have others echoing this once I get students looking at the models very soon).

A second controversial fix: abandon the complex service-specific flow terms and just go back to "theoretical source," "blocked flow," etc. I continue to question what we gain from the complex terms when even those who developed them find them cumbersome.

lambdatronic commented 12 years ago

FWIW, I'm with K-Bag on this one. It's not an Arc* issue either. gdalinfo reports all those whacky names when run on a NetCDF file, and of course, I see the same ones when importing layers into GRASS. The original (and much simpler) theoretical/inaccessible/possible/blocked/actual source/sink/use/flow is much easier to wrap your head around when interpreting the results of our models.

kbagstad commented 12 years ago

Closing as the flow concepts have been refactored to their original and more readable names.