frizbog / gedcom4j

Java library for reading/writing genealogy files in GEDCOM format
http://gedcom4j.org
55 stars 37 forks source link

Conformity with common programming standards #79

Closed jtroester closed 9 years ago

jtroester commented 9 years ago

During the last few days I did some testing with the Ahnentafel example and a gedcom file covering about 17.000 individuals. Performance of the parser is quite fine. For further use of the library i'd like to suggest some technical enhancements. Are there already any considerations about enhancing conformity with common programming standards and patterns? E.g. fields should be hidden and the API should be exposed on public getter and setter methods according to the JavaBeans pattern, List/ArrayList types should be replaced with Set/HashSet types wherever duplicates may occur and therefore the API relevant types have to be casted to Collection types in order to hide implementation details in order to avoid explicitly checking if new element is already contained in the collection. This approach requires ordering of elements being out of scope. If contribution is appreciated please let me know...

frizbog commented 9 years ago

Javabeans There was a lot of consideration given to converting to JavaBeans a couple years back. There was also a discussion (if you can call it that) on Google+ here: https://plus.google.com/+Gedcom4jOrg/posts/f7FYpaZ88Aa . The conclusion made at that time was to leave things as it was since there was little compelling draw to converting, real downsides to making the switch, and the only people expressing an opinion preferring leaving things as they were. I personally don't like the Javabeans spec in general (see http://www.javapractices.com/topic/TopicAction.do?Id=84) and Joshua Bloch long ago convinced me to avoid using it unless needed. I'm certainly open to discussing it though, and if it is justifiable, I will switch things.

Sets vs Lists We definitely should always use the right collection type. If you see situations where we should be using a Set instead of a List because the GEDCOM spec does not allow duplicates and they should be absolutely prohibited by the library, then please let me know or log a defect. Similarly, if you see anywhere that a Set is being used but duplicates need to be allowable, or where duplicates are not to be allowed but order is not being preserved, please log those as well so we can switch to Lists or LinkedHashSets as needed.

I'm not sure I'm understanding your comment though about casting to Collection types. Can you elaborate please?

Thanks for the feedback and suggestions. I'd like to know what you think.

jtroester commented 9 years ago

Javabeans Let me quote from SonarQube static code analysis report:

Public class variable fields do not respect the encapsulation principle and has two main disadvantages:

  • Additional behavior such as validation cannot be added.
  • The internal representation is exposed, and cannot be changed afterwards.

I assume applying the JavaBean pattern is a common coding paradigm in the IT industry and should also be best practice for open source software like gedcom4j. The JavaBean pattern additionally simplifies the integration with other frameworks.

Sets vs Lists The Collection type is the common superinterface of List and Set respectively. Exposing the Collection type ensures that implementation details are hidden from the API.

Another aspect of using Set/HashSet is to introduce idempotence - repeatedly adding the same element to a collection results in the element actually being added once. Assuming an appropriate business logic this approach generally improves robustness and usability of a framework.

E.g. adding a distinct Individual to a Family more than once should be prevented as duplicates may cause unexpected behaviour of the application. It's an aspect of the framework´s error handling concept how to respond to adding a duplicate: throwing an exception or ignoring the failed attempts silently are the most common approaches.

ToppleTheNun commented 9 years ago

Unfortunately, changing from List to Set is a backwards compatibility issue. Implementing the Java Bean pattern is also a backwards compatibility issue.

One solution being discussed is a List implementation that does not allow duplicate entries in order to maintain the current API and backwards compatibility.

Design aspects, theories, idioms, and religions aside, backwards compatibility is one of the main goals of this library. Considering that the project uses Semantic Versioning in all but name, it would be able to break backwards compatibility if the major version number is incremented. Without incrementing the major version number, it should not break backwards compatibility.

frizbog commented 9 years ago

I think that an inner List implementation that prevents duplicates is probably the most API-neutral way to accomplish the goal of preventing duplicates, although to jtroester's point, it doesn't do much in terms of information hiding.

In general, I'm not really very worried about consumers of the API having visibility to its internals, to be honest, so that is the main reason the project hasn't done more to hide its internals as it has grown organically over the years. This is really the first time anyone has seriously suggested it in the five years since I first released the project; but then, a library for reading/writing/manipulating GEDCOMs without providing any user interface is a pretty niche product, and as far as I know, the number of users has yet to make it into triple digits.

I also think it is clearly debatable whether the JavaBeans spec is advisable here. It's easy to find cogent arguments on both sides by respected leaders in the field. If information hiding does become an important design goal of the project -- and I am open to that -- then JavaBeans would be the way to go, along with changing the API to use Collection rather than List/Set, etc. Such a shift would be a general style change, where protecting and hiding internals would become an overall project design goal, and we'd look at how best to do that across the entire project.

Arguments to bring information hiding into gedcom4j will be more persuasive if they are based on specific problems caused by not hiding information better. For example, "technology X is incompatible with gedcom4j because it only works with JavaBeans but cannot work with public fields of public classes". Such arguments would then be evaluated against specific issues that would definitely be caused by API changes. Arguments based on general principles will be considered as well, but will not carry as much weight as identifying specific ways that the library fails to work, and will have to be compelling enough so that the choice of upgrading and changing their code to remain compatible seems worthwhile to current users.

TL;DR: I'm going to leave this issue open so that others can weigh in. I am open to these and other suggestions for future releases, but for the short term I won't be making any changes that would require my existing clients to change their code. If that day comes, it would be a major release change (3.x.x).

jtroester commented 9 years ago

Thanks for all your comments. The gedcom4j project gives me the impression to be a solid foundation for handling a variety of gedcom file flavours and to be suffiently stable for integrating gedcom4j with my work.

When working with frameworks I expect some standards to be fulfilled to ease integration and maintainability. To deal with a mix-up of individual API styles complicates effective work. I recognize that from a project owner's point of view there are good reasons to garantuee backward compatibility to all current customers.

It's in the nature of things that sooner or later new requirements arise and hence the project's goals have to be balanced out.

In order that everyone may benefit from it these are my suggestions that I would like you to consider:

frizbog commented 9 years ago

Created new Milestone for v3 of gedcom4j, for some future date. Will add other roadmap items to that milestone in addition to the two main items discussed here. Closing this issue in favor of the other two.