Open davidhassell opened 3 years ago
Initial, not-too deeply thought-about UML suggestions:
1)
2)
I think we can just use numpy.dtype
for the Datatype, can't we? Or do we need something more?
Hi,
Coming at it from an encoding-independent view point, how about (source code at end):
I think something along these lines is what we want for this document, and then Neil's UML would be the data model of the implementation, rather than the pared-down logical connections.
Not 100% convinced about my arrow, heads, and tails, as ever!
Thanks, David
# ====================================================================
# Source code. Create with:
#
# $ dot -T png file.gv -o file.png
# ====================================================================
digraph {splines=ortho nodesep="+0.25"
node [
style="filled,bold"
shape=rectangle
fillcolor="#FFA533"
width=1.5
height=0.7
fontname="Arial"
]
# --------------------------------------------------------------------
# CF data model constructs
# --------------------------------------------------------------------
AggregationVariable [
label="AggregationVariable"
]
AggregationInstructions [
label="AggregationInstructions"
]
AggregatedDimension [
label="AggregatedDimension"
]
FragmentDimension [
label="FragmentDimension"
]
Fragment [
label="Fragment"
]
AggregatedData [
label="AggregatedData"
]
edge [dir=both
arrowsize=1.0
fontname="Arial"
labelfontsize=11.0
]
AggregationVariable -> AggregationInstructions [arrowhead=diamond arrowtail=vee]
AggregationVariable -> AggregatedDimension [arrowhead=odiamond arrowtail=vee taillabel="0..* "]
{rank=same; AggregationInstructions, AggregatedDimension}
{rank=same; FragmentDimension, Fragment}
AggregationInstructions -> Fragment [arrowhead=odiamond arrowtail=vee taillabel="0..* "]
FragmentDimension -> Fragment [arrowhead=none arrowtail=vee taillabel=" 0..* " ]
FragmentDimension -> AggregatedDimension [arrowhead=vee arrowtail=none]
AggregatedData -> AggregationInstructions [arrowhead=odiamond arrowtail=vee]
AggregationVariable -> AggregatedData [arrowhead=diamond arrowtail=vee]
}
... also haven't worked out yet if "AggregationInstructions" is a logical entity, or not ...
I’ve compared this to my UML, and it does seem like a distillation of what I have, plus the “Aggregation Instructions”. Which is good, as it shows we have a similar idea as to what the classes should be! :)
I think we can try to make the pared-down logical connections and the data model as close as possible. I'd like the data model to be a superset of the pared-down model, rather than distinct from it. I think, from your diagram, that we can work toward that.
We can try to think through what the "AggregationInstructions" mean, and what form they should take. I'll have a look through the document again and have a think.
Okay, it took a bit of thinking (although I am slow in my between holidays week!), but I'm happy with this:
I think it works nicely separating the AggregationInstructions and AggregatedData, from a parsing point of view.
I've been quiet here whilst the diagrams were being formulated, but thought I would jump in at this point to say that both the data model UML (from Neil) and the pared-down logical connection schematic (from David) are looking like very useful and clear condensations of the concepts and to agree with Neil:
I’ve compared this to my UML, and it does seem like a distillation of what I have, plus the “Aggregation Instructions”
the two (current) diagrams seem consistent, also, as far as I can tell.
So it would be great to get both diagrams included in the document as soon as you are both happy with them and Bryan has looked over them and is also satisfied. Great stuff!
Hi Neil,
Thanks. This is getting interesting! I don't think we're quite there, yet, though ....
I think we can try to make the pared-down logical connections and the data model as close as possible.
This is where we differ. I think the pared-down logical connection view is the CFA data model. The data model should be the starting point of any software implementation, and allow for different encoding of CFA datasets.
Be assured, I'm absolutely not claiming my UML is already all there! I found it useful to compare the difference between the two views (all this is written in good spirits and reflects my current thinking, which is certainly plastic!):
There is a fundamental difference between the the two in that in the logical model the Fragment
can exist without reference to the AggregationVariable,
but that is not the use in the implementation model. The fact that a fragment exists without requiring anything associated with the AggregationVariable
is, I think, key to these conventions. These differences manifest themselves in the implementation data model as:
Fragment
(I think they are only components of the AggregationInstructions
)Fragment
is an aggregation of FragmentDimension.
(I think it is merely associated.)Also:
AggregationVariable
need to be composed of the AggregationInstructions
, as well?AggregatedData
can exist independently of AggregatedVariable
. In a normal netCDF variable, the variable is composed of its data, so when the AggregatedData is created it seems right that the same connection should apply in our modelAggregationVariable
and a Fragment
, and not the AggregatedData
More generally, I don't think we should replicate elements of the netCDF data model in ours, such as "name" and "size" of a dimension. It is good to say, for instance, that in the netCDF encoding a FragmentDimension
corresponds to a netCDF Dimension, but we don't need to (and shouldn't) hard wire in the netCDF encoding to the data.
Cheers, David
This is where we differ. I think the pared-down logical connection view is the CFA data model. The data model should be the starting point of any software implementation, and allow for different encoding of CFA datasets.
I think we agree, and I just worded it badly! :) The implementation model should be a specialisation of the data model.
I'm still getting my head around composition vs aggregation. Can I think of it as: in composition the object contains the other object (in a list or as a variable, for example) and in aggregation, the object contains a reference to the other object?
Surely the AggregationVariable need to be composed of the AggregationInstructions, as well
Returning to this today: absolutely!
I'm not sure that the AggregatedData can exist independently of AggregatedVariable. In a normal netCDF variable, the variable is composed of its data, so when the AggregatedData is created it seems right that the same connection should apply in our model
Yes, confusion about composition and aggregation.
Similarly, perhaps DataType is only a feature of the AggregationVariable and a Fragment, and not the AggregatedData
I think it could be either, but I'm happy to move it.
PlantUML source:
@startuml
class DataType {
}
class Fragment {
+int location
+string file
+string format
+string address
+string units
}
class AggregatedData {
+string units
}
class AggregationInstructions {
+string location
+string file
+string format
+string address
}
class AggregatedDimension {
}
class FragmentDimension {
}
class AggregationVariable {
+string name
}
AggregationVariable "1" o--> "0..*" AggregatedDimension
AggregationVariable "1" *--> "1" AggregatedData
AggregatedData "1" *--> "0..*" Fragment
Fragment "1" o--> "0..*" FragmentDimension
AggregatedDimension "1" o--o "1" FragmentDimension : ordered
AggregationVariable "1" *--> "1" AggregationInstructions
AggregationVariable "1" o--> "1" DataType
Fragment "1" o--> "1" DataType
@enduml
I think some of the confusion between the views and the composition/aggregation is around the difference between the Fragment as a variable in the CFA definition which defines something about a Fragment which is a file containing that data. Since in most cases the fragment (file) contains only fragment (data) which is pointed to by the fragment (variable in the CFA master file) ... we can and do get lazy about which is which. Can we come up with a clearer nomenclature for these three usages?
Hi Bryan,
Could you elaborate on what you mean by "a variable in the CFA definition"?
The CFA Fragment is "An independent, possibly self-describing, array that defines a contiguous part of the aggregated data. The aggregated data is composed from a multi-dimensional orthogonal array of fragments." (https://github.com/NCAS-CMS/cfa-conventions/blob/master/source/cfa.md#Terminology). Whether or not a Fragment is a variable in the CFA-netCDF file, or is a somehow stored in another file (with or without other data) is neither here nor there.
Apologies if I've not sensed the point of your post!
I think the sense of my point is that whether something is composed or aggregated depends on whether one is thinking about it as "an array" and "part of something described inside the current scope" (e.g the CFA Fragment usage) or the thing that is pointed to in the content (attributes) of that array. So we have CFA Fragments and Fragments ... the former is composed and the latter is aggregated ... I think.
- So from a UML point of view, is the UML describing the information model held in the file, or is it the information model for the things described by the file?
For me, it should be the latter
Hi @davidhassell, @nmassey001, @bnlawrence: please can we revive this? At this point we have a stand-alone v0.6 Conventions document with several examples outlined given a comprehensive overview, but ultimately the entire document is still pure text, which is quite intimidating. For that reason, and because we have some definite ideas fleshed out here already that may be ready for use or near enough, we should try to add in a schematic or two soon, I think.
Both Neil and David's ideas, as covered above, look really useful. From my reading of the above thread, the latest formulation of concrete ideas are David's diagram as covered in https://github.com/NCAS-CMS/cfa-conventions/issues/21#issuecomment-895910161 and Neil's diagram as covered in https://github.com/NCAS-CMS/cfa-conventions/issues/21#issuecomment-897732276 (image) and https://github.com/NCAS-CMS/cfa-conventions/issues/21#issuecomment-897735752 (source) and we are at the following state of agreement/review with regards to each: generally finding both diagrams consistent, with a few requested tweaks perhaps to one or both. But, as a side issue perhaps, Bryan (again as I understand it, though I could be misinterpreting) wants (see https://github.com/NCAS-CMS/cfa-conventions/issues/21#issuecomment-901012400) terminology to be made clearer with regards to fragments (potentially in the text, not just the diagrams?):
Since in most cases the fragment (file) contains only fragment (data) which is pointed to by the fragment (variable in the CFA master file) ... we can and do get lazy about which is which. Can we come up with a clearer nomenclature for these three usages?
So as far as I can see, to go forwards we need to agree on:
And then we can put in a PR (or two, if the first issue is a wider aspect rather than a sub-issue of this relating to the diagrams, which isn't clear to me from the above comments).
@sadielbartholomew, whilst reviewing PR #20, had the excellent idea of one or more diagrams that helps explain the text:
From https://github.com/NCAS-CMS/cfa-conventions/pull/20#pullrequestreview-691015428:
This was discussed briefly in PR #20 but then moved to this issue so that it didn't hold up that PR being accepted.