FamilySearch / gedcomx

An open data model and an open serialization format for exchanging genealogical data.
http://www.gedcomx.org
Apache License 2.0
351 stars 67 forks source link

create canonical UML and specification text #114

Closed jralls closed 12 years ago

jralls commented 12 years ago

Having the code under version control (especially git!) is great, but usually in standards and specifications the code is not canonical, even when it's a reference implementation.

If you're not ready to publish the draft standard in English, that's OK (though it will have to be eventually), at least get the UML fully articulated and into the repository so that we have a canonical reference to frame suggestions and criticism. I've found ArgoUML to be an excellent modelling tool, and it can read and write XMI files which are quite amenable to version control.

stoicflame commented 12 years ago

Actually, the UML is currently under version control, although that's not documented very well at the moment. The UML is also not considered canonical, but more just for documentation purposes. The UML projects are managed using NClass, and the files are located under each module:

I agree that the final cut of version 1.0 should include canonical UML diagrams and formal specification text. For now, we're maintaining the canonical version in code until we get closer to cutting that formal spec.

stoicflame commented 12 years ago

I'm updating the title of this issue to better reflect the intent to use this issue to track the work for solidifying the UML and spec text.

jralls commented 12 years ago

NClass appears to me to be a poor choice of tool for two reasons: First, it doesn't write XMI, so one is locked in to using that one tool. Second, it uses .Net/mono, which is a serious PITA for non-M$Windows users.

Keeping the canonical version (in fact, given the seriously incomplete documentation, _the only version_) in code locks out the folks you most need to contribute: The professional genealogists who aren't also programmers.

Consider: Robert Charles Anderson and John Wylie, two of the members of the GENTECH Genealogical Data Model project and both prominent genealogists were in your second lecture on Thursday. They have input for you, but they're not going to be able to help you very much if you're expressing yourself in code instead of engllish.

stoicflame commented 12 years ago

it doesn't write XMI, so one is locked in to using that one tool.

Fair enough. That does seem interesting. What other programs read/write XMI?

Second, it uses .Net/mono, which is a serious PITA for non-M$Windows users.

Actually, I found NClass very easy to use. I'm on Ubuntu. It runs great with mono.

Keeping the canonical version (in fact, given the seriously incomplete documentation, the only version) in code locks out the folks you most need to contribute: The professional genealogists who aren't also programmers.

Agreed. We need their input, and we need to make this project more accessible to them.

jralls commented 12 years ago

it doesn't write XMI, so one is locked in to using that one tool.

Fair enough. That does seem interesting. What other programs read/write XMI?

See http://en.wikipedia.org/wiki/List_of_UML_tools#Features

stanm commented 12 years ago

I would also like to see a UML design document.
Another tool (commercial) that generates XMI is Enterprise Architect: http://www.sparxsystems.com/ I have used it for generating web documentation: http://gdmuml.hostingsiteforfree.com/GEDCOM-UML/index.htm

jralls commented 12 years ago

Actually, I found NClass very easy to use. I'm on Ubuntu. It runs great with mono.

Install is the issue, but after a couple of tries and some edits I got it going in a Debian VM.

What a waste. You might as well use Dia. That's not a UML modeler, it's a half-baked diagram editor. To see what I mean, get argo (or if you like KDE, Umbrello is part of KDevelop), fire it up, and import the gedcomx sources (change the model name before you start importing or a bunch of stuff will get named "untitledModel_Foo"). You'll want to set the java import settings to multiplicity, otherwise aggregate member variables are created as untyped lists. It will make a bunch of class diagrams for you (which you'll need to clean up), but the important part is that all of your classes (both the ones you created and the imported dependencies) are all characterized in the model.

So, for example, on the Record class diagram as created there's no GenealogicalEntity or GenealogicalResource, but you want to see them, right? So in the tree diagram expand "common" and you'll find those two classes. Drag them onto the diagram and the generalization and association lines appear immediately because that's already in the model: The diagrams (and you can have as many as you like) are just views showing parts of the model. Each one can have as little or as much detail as you need to illustrate a point or to guide your coding.

The catch is that only class diagrams are auto-generated from the code. You'll have to map out collaborations, sequences, and activities manually.

jralls commented 12 years ago

You'll want to set the java import settings to multiplicity, otherwise aggregate member variables are created as untyped lists

I need to amend that: If you do set the settings to multiplicity, it creates the associations to some List object (which is undoubtedly in the model somewhere...) instead of recognizing that List (or, presumably any other collection type) as an association to class Foo with multiplicity 1..*. You have to fix those from the model explorer and then add them to the diagrams.

stoicflame commented 12 years ago

Wow. I must admit that's pretty powerful. I think you're right; argo does seem like the way to go. We'll continue to leave this issue open to track that work.

stoicflame commented 12 years ago

Let me make a suggestion. When you're looking at the canonical Java code, don't think "Java" code. Think "XML Schema". because that's really what JAXB objects are.

I guess I'm just too attached to my tools. The thing is, because we're using Java to define the schema, the tools that we use can not only generate XML Schema but also client-side code stubs for various languages, rich documentation for the model, and eventually even the UML diagrams and graphic syntax diagrams (see #133).

I guess I was hoping the community could get on board and collaborate in terms of the Java schema code, but alas maybe that's just too much to ask...

jralls commented 12 years ago

Let me make a suggestion. When you're looking at the canonical Java code, don't think "Java" code. Think "XML Schema". because that's really what JAXB objects are.

I'm mostly looking at UML imported from the java code by argo, referring to the Java to clear up ambiguities (as I mentioned somewhere else, argo doesn't read what in C++ we would call "template arguments", so that List just comes across as an unqualified List). That doesn't help at all with the sort of ambiguities we've been discussing in #146.

I'll go study JAXB a bit more, but I'm put off by the bit in the introduction to the Oracle manual which says that it's simplified and that for complex situations DOM and SAX interfaces may be more appropriate.

The thing is, because we're using Java to define the schema, the tools that we use can not only generate XML Schema

OK, how? Is there a XMLSchema document that can be linked to the wiki? Is it linked to the wiki and I just haven't found it yet?

also client-side code stubs for various languages

No, not really. At least the C and Objective-C examples I've looked at are not functional.

rich documentation for the model

Really? Isn't it just javadoc-processing the comments in the code? That's so far anyway not very rich and not at all useful as a specification.

You are giving the impression that you are designing in code. Since that's a practice generally beaten out of CS students by their sophomore year I'm sure that you're not doing that -- but you're not sharing the design part of your work online, and so not communicating well with the online part of "the community".

I guess I was hoping the community could get on board and collaborate in terms of the Java schema code, but alas maybe that's just too much to ask...

Since by and large the domain experts (i.e. genealogists) don't understand programming and can't read Java, that's not a very realistic hope. Even UML diagrams (or ERD diagrams, for that matter) aren't going to work communicating with them.

jralls commented 12 years ago

Let me make a suggestion. When you're looking at the canonical Java code, don't think "Java" code. Think "XML Schema". because that's really what JAXB objects are.

I actually built gedcomx this afternoon after setting up maven to run schemagen and found that Enunciate does its own schema generation. It would be really nice if you could wrap some css around those xsd files and link to them off of the Developer's Guide like e.g. GenealogicalResource

stoicflame commented 12 years ago

You are giving the impression that you are designing in code.

Just an impression, huh? Let me be more explicit, then. I am unabashedly designing in code. I must have been sick that day in my CS class when they were teaching that's a Bad Thing. :-)

Since by and large the domain experts (i.e. genealogists) don't understand programming and can't read Java, that's not a very realistic hope. Even UML diagrams (or ERD diagrams, for that matter) aren't going to work communicating with them.

Indeed.

So I'm bumping this up to priority 1. We need some kind of canonical document that is not code that we can use to focus attention and broaden the audience for collaboration.

tfmorris commented 12 years ago

I think having a domain expert reviewable model & documentation as well has having a machine readable model are both excellent goals. Making the model reviewable by domain experts only when it's done is basically tell them that you don't care about their input.

I'm the project lead for ArgoEclipse and a long time ArgoUML committer, so I'm loathe to talk anyone out of using ArgoUML, but there are some dirty secrets associated with UML that you should be aware of:

On the plus side, ArgoUML is free, fully open source, supports code generation in a bunch of languages and reverse engineering in the more popular ones (modulo things like Java generics). ArgoEclipse & ArgoUML's UML 2.x support is based on the Eclipse UML2 plug-in which should improve interchange with other tools based on it (TOPCASEd, Papyrus, Rational, etc).

If anyone has questions about Argo or UML tool in general feel free to msg me. I think I've also got an Argo version of the GDM laying around somewhere that I did ages ago for the GeneaPro project http://geneapro.sourceforge.net/

stoicflame commented 12 years ago

@tfmorris thanks for the tips. Great input.

stoicflame commented 12 years ago

Finally, FINALLY, we got the means to move this. Check out the Specs, Diagrams, and Illustrations blog post.