google-code-export / uimafit

Automatically exported from code.google.com/p/uimafit
2 stars 1 forks source link

Consider changing return type of several methods from Collection to List #113

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
We have a number of methods that return Collection instead of List even though 
internally a List instance is created and filled. It might be more convenient 
in some cases if these methods actually returned List.

Original issue reported on code.google.com by richard.eckart on 8 Jan 2012 at 3:25

GoogleCodeExporter commented 9 years ago
This should be a compatible change since List is a subtype of Collection.

Original comment by richard.eckart on 8 Jan 2012 at 3:25

GoogleCodeExporter commented 9 years ago
This is a great inconvenience - for the outside user, order is not guaranteed 
(even though you say order exists in practice, this is not guaranteed by the 
interface), so we cannot rely on the order, causing various problems - 
inconsistency in debugging, inability in comparing CAS dumps (XMI files), etc. 
(These are only potential problems, but they may occur with current 
definitions).

Original comment by ofe...@gmail.com on 6 Dec 2012 at 4:24

GoogleCodeExporter commented 9 years ago
Actually, even if a List is returned, it does not per-se define that the order 
remains stable over each invocation of the method. Thus, somebody could 
complain that with List, the same problems regarding inconsistencies exist.

The main point (and we probably should add this to the documentation) is what 
defines the order of the data returned by each method - and there may be 
differences here. E.g. select(FSArray, Class) returns the annotations in the 
order they are defined in the array. select(JCas, Class) returns the 
annotations in the order defined by the index for that type - per default the 
UIMA built-in index on Annotation, but that could be overwritten by a user 
specifying a custom index.

If the type you select does not inherit from Annotation (but e.g. from TOP) and 
you did not configure a custom index for your type, the order in which the data 
is returned is in fact undefined and would be even if we change the method 
signature to return List.

Original comment by richard.eckart on 6 Dec 2012 at 4:30

GoogleCodeExporter commented 9 years ago
I agree with all of that, but note that this does not make returning a List 
redundant - the specification you gave should in fact be in the documentation, 
but it is only valid if List is returned. Otherwise you cannot guarantee that 
the above said will hold. so I say documentation + returning a List would 
suffice.

Original comment by ofe...@gmail.com on 6 Dec 2012 at 5:23

GoogleCodeExporter commented 9 years ago
The only difference between returning a List and returning a Collection is that 
the List interface provides some more methods that we need to implement, but 
none of them is guaranteeing a particular order. Everything else is convention 
and not enforced by any part of the programming language. So documenting the 
behavior is equally valid, no matter if the returned type is List or 
Collection. 

The only Java type that immediately comes to my mind which in fact does enforce 
an order is the TreeSet.

Original comment by richard.eckart on 6 Dec 2012 at 5:30

GoogleCodeExporter commented 9 years ago
For most of the select* methods, the reason that they return Collection instead 
of List is, that the internally work on an UIMA FSIndex which supports 
iterator() and size() (like Collection), but it does not support random access 
like List goes with its get() method. So we would have to implement additional 
logic to support random access. Alternatively we could thrown an 
UnsupportedOperationException the random-access methods of List - but then we 
can also just implement only the Collection interfaces, which is what we did.

Original comment by richard.eckart on 6 Dec 2012 at 7:31

GoogleCodeExporter commented 9 years ago
I still think that using a list (plus the aforementioned documentation) gives a 
much clearer and more appropriate interface, even in the cost of throwing 
UnsupportedOperationException (to avoid implementing new logic).

Original comment by ofe...@gmail.com on 6 Dec 2012 at 7:48

GoogleCodeExporter commented 9 years ago
Let me just chime in here to say that I agree with Richard. I don't think the 
Java documentation agrees with the conceptual interpretation you have of 
Collection and List:

Collection — the root of the collection hierarchy. A collection represents a 
group of objects known as its elements. The Collection interface is the least 
common denominator that all collections implement and is used to pass 
collections around and to manipulate them when maximum generality is desired. 
Some types of collections allow duplicate elements, and others do not. Some are 
ordered and others are unordered...
List — an ordered collection (sometimes called a sequence). Lists can contain 
duplicate elements. The user of a List generally has precise control over where 
in the list each element is inserted and can access elements by their integer 
index (position). If you've used Vector, you're familiar with the general 
flavor of List. Also see The List Interface section.
http://docs.oracle.com/javase/tutorial/collections/interfaces/index.html

Specifically, I think the sentence "The user of a List generally has precise 
control over where in the list each element is inserted and can access elements 
by their integer index (position)" is something that Lists are expected to 
support, but doesn't generally make sense for the UIMA-index backed collections 
returned by uimaFIT.

My point here is that List is *one* type of ordered collection, but not the 
*only* type of ordered collection. And I believe the Java documentation 
supports me on that.

Original comment by steven.b...@gmail.com on 7 Dec 2012 at 9:58

GoogleCodeExporter commented 9 years ago
That is true, but still the returned output IS ordered. And let's remember that 
all other select* methods (except for select() itself) do return a List, and to 
me that makes a lot of sense. If you have another suggestion of what type 
should be returned that guarantees that the Collection has a defined order that 
the user could rely on - then it's worth mentioning. To me it sounds that a 
List is a good candidate for that and better than the current situation (and 
more compatible with the rest of the select* methods), even if not the best 
idea. By the way, the Lists returned by the other select* methods - how do they 
handle the attempt for random access?

Original comment by ofe...@gmail.com on 7 Dec 2012 at 7:27

GoogleCodeExporter commented 9 years ago
Well, have a look at the code if you are interested in the dirty details. The 
select* methods that do return lists internally create an ArrayList, fill it 
and return it. 

Again, there is no type at all that would guarantee a particular order, except 
TreeSet - and the order in it is defined via a comparator or by a compare() 
method that every item has to implement. Order is provided by underlying UIMA 
mechanisms, not by uimaFIT nor by the List or Collection interfaces. uimaFIT 
preserves whatever order is defined in these underlying mechanism.

More information can be found here: 
http://uima.apache.org/d/uimaj-2.4.0/tools.html#ugr.tools.cde.indexes

In particular this statement:

"The built-in Annotation Index is always present. It is based on the built-in 
type uima.tcas.Annotation and has keys begin (Ascending), end (Descending) and 
TYPE_PRIORITY. There are no built-in type priorities, so this last sort item 
does not play a role in the index unless type priorities are specified."

Original comment by richard.eckart on 7 Dec 2012 at 7:37

GoogleCodeExporter commented 9 years ago
As a side note: the most if not all of the select* methods that currently 
return a Collection, return a *live* object. That is, the content of the 
collection is directly connected to the CAS and if the CAS changes, the content 
of the collection may change. The selectCovered() returns a *disconnected* 
List, one that is not directly backed by the CAS. E.g. you could savely iterate 
over the result of selectCovered() and add/remove annotations, while if you did 
the same iterating over the result of the other select* methods would likely 
cause some strange exceptions because the underlying UIMA annotation index 
iterators may become invalid.

Original comment by richard.eckart on 7 Dec 2012 at 7:48

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 7 Jan 2013 at 4:51