frizbog / gedcom4j

Java library for reading/writing genealogy files in GEDCOM format
http://gedcom4j.org
55 stars 37 forks source link

Generalize interface types where possible and prudent #84

Closed frizbog closed 8 years ago

frizbog commented 9 years ago

(See Issue #79)

There are a number of places where we could use less-specific interface types than we use currently. Scenarios to evaluate on parameters and return types:

Of course, this should only be done when it makes sense.

frizbog commented 8 years ago

I've decided to continue return Strings as return types on the API instead of CharSequences. People generally expect to receive a String back, and expect it to be immutable and have defined behavior for equals() and hashCode(), none of which are guaranteed for CharSequences.

I may still accept CharSequences as parameters into the API from calling code to allow more flexibility for callers.

frizbog commented 8 years ago

Upon further experimentation and consideration, I will not be switching to HashSets in the object model to avoid duplicates.

The lightest Set implementation that ships with Java is a HashSet, and it has an overhead of 32 bytes per entry compared to ArrayList's 4 bytes. When I'm working this hard to reduce the memory footprint (especially for large files), I'm not inclined to add that much memory overhead just to keep users from adding an item to a collection more than once.

An argument could also be made that if someone is reading a GEDCOM file that has duplicated entries in it, that duplication should be preserved in the object model. There's nothing in the GEDCOM spec that specifically forbids it or considers it an error, although I don't know anyone who is arguing that support for duplication is actually desirable.

I am opening another issue, though, for a compromise: The validators could look for duplicates, and if autorepair is enabled in the validators, the duplicates could be automatically removed.

For release 3.0.0, though, Collections are staying Lists backed by ArrayLists, and Maps are remaining HashMaps.

frizbog commented 8 years ago

Decided against using CharSequences anywhere. In order to safely work with the data from a CharSequence, I'd need to make defensive String copies anyway, which further bloats things.