Open GoogleCodeExporter opened 8 years ago
This should be a compatible change since List is a subtype of Collection.
Original comment by richard.eckart
on 8 Jan 2012 at 3:25
This is a great inconvenience - for the outside user, order is not guaranteed
(even though you say order exists in practice, this is not guaranteed by the
interface), so we cannot rely on the order, causing various problems -
inconsistency in debugging, inability in comparing CAS dumps (XMI files), etc.
(These are only potential problems, but they may occur with current
definitions).
Original comment by ofe...@gmail.com
on 6 Dec 2012 at 4:24
Actually, even if a List is returned, it does not per-se define that the order
remains stable over each invocation of the method. Thus, somebody could
complain that with List, the same problems regarding inconsistencies exist.
The main point (and we probably should add this to the documentation) is what
defines the order of the data returned by each method - and there may be
differences here. E.g. select(FSArray, Class) returns the annotations in the
order they are defined in the array. select(JCas, Class) returns the
annotations in the order defined by the index for that type - per default the
UIMA built-in index on Annotation, but that could be overwritten by a user
specifying a custom index.
If the type you select does not inherit from Annotation (but e.g. from TOP) and
you did not configure a custom index for your type, the order in which the data
is returned is in fact undefined and would be even if we change the method
signature to return List.
Original comment by richard.eckart
on 6 Dec 2012 at 4:30
I agree with all of that, but note that this does not make returning a List
redundant - the specification you gave should in fact be in the documentation,
but it is only valid if List is returned. Otherwise you cannot guarantee that
the above said will hold. so I say documentation + returning a List would
suffice.
Original comment by ofe...@gmail.com
on 6 Dec 2012 at 5:23
The only difference between returning a List and returning a Collection is that
the List interface provides some more methods that we need to implement, but
none of them is guaranteeing a particular order. Everything else is convention
and not enforced by any part of the programming language. So documenting the
behavior is equally valid, no matter if the returned type is List or
Collection.
The only Java type that immediately comes to my mind which in fact does enforce
an order is the TreeSet.
Original comment by richard.eckart
on 6 Dec 2012 at 5:30
For most of the select* methods, the reason that they return Collection instead
of List is, that the internally work on an UIMA FSIndex which supports
iterator() and size() (like Collection), but it does not support random access
like List goes with its get() method. So we would have to implement additional
logic to support random access. Alternatively we could thrown an
UnsupportedOperationException the random-access methods of List - but then we
can also just implement only the Collection interfaces, which is what we did.
Original comment by richard.eckart
on 6 Dec 2012 at 7:31
I still think that using a list (plus the aforementioned documentation) gives a
much clearer and more appropriate interface, even in the cost of throwing
UnsupportedOperationException (to avoid implementing new logic).
Original comment by ofe...@gmail.com
on 6 Dec 2012 at 7:48
Let me just chime in here to say that I agree with Richard. I don't think the
Java documentation agrees with the conceptual interpretation you have of
Collection and List:
Collection — the root of the collection hierarchy. A collection represents a
group of objects known as its elements. The Collection interface is the least
common denominator that all collections implement and is used to pass
collections around and to manipulate them when maximum generality is desired.
Some types of collections allow duplicate elements, and others do not. Some are
ordered and others are unordered...
List — an ordered collection (sometimes called a sequence). Lists can contain
duplicate elements. The user of a List generally has precise control over where
in the list each element is inserted and can access elements by their integer
index (position). If you've used Vector, you're familiar with the general
flavor of List. Also see The List Interface section.
http://docs.oracle.com/javase/tutorial/collections/interfaces/index.html
Specifically, I think the sentence "The user of a List generally has precise
control over where in the list each element is inserted and can access elements
by their integer index (position)" is something that Lists are expected to
support, but doesn't generally make sense for the UIMA-index backed collections
returned by uimaFIT.
My point here is that List is *one* type of ordered collection, but not the
*only* type of ordered collection. And I believe the Java documentation
supports me on that.
Original comment by steven.b...@gmail.com
on 7 Dec 2012 at 9:58
That is true, but still the returned output IS ordered. And let's remember that
all other select* methods (except for select() itself) do return a List, and to
me that makes a lot of sense. If you have another suggestion of what type
should be returned that guarantees that the Collection has a defined order that
the user could rely on - then it's worth mentioning. To me it sounds that a
List is a good candidate for that and better than the current situation (and
more compatible with the rest of the select* methods), even if not the best
idea. By the way, the Lists returned by the other select* methods - how do they
handle the attempt for random access?
Original comment by ofe...@gmail.com
on 7 Dec 2012 at 7:27
Well, have a look at the code if you are interested in the dirty details. The
select* methods that do return lists internally create an ArrayList, fill it
and return it.
Again, there is no type at all that would guarantee a particular order, except
TreeSet - and the order in it is defined via a comparator or by a compare()
method that every item has to implement. Order is provided by underlying UIMA
mechanisms, not by uimaFIT nor by the List or Collection interfaces. uimaFIT
preserves whatever order is defined in these underlying mechanism.
More information can be found here:
http://uima.apache.org/d/uimaj-2.4.0/tools.html#ugr.tools.cde.indexes
In particular this statement:
"The built-in Annotation Index is always present. It is based on the built-in
type uima.tcas.Annotation and has keys begin (Ascending), end (Descending) and
TYPE_PRIORITY. There are no built-in type priorities, so this last sort item
does not play a role in the index unless type priorities are specified."
Original comment by richard.eckart
on 7 Dec 2012 at 7:37
As a side note: the most if not all of the select* methods that currently
return a Collection, return a *live* object. That is, the content of the
collection is directly connected to the CAS and if the CAS changes, the content
of the collection may change. The selectCovered() returns a *disconnected*
List, one that is not directly backed by the CAS. E.g. you could savely iterate
over the result of selectCovered() and add/remove annotations, while if you did
the same iterating over the result of the other select* methods would likely
cause some strange exceptions because the underlying UIMA annotation index
iterators may become invalid.
Original comment by richard.eckart
on 7 Dec 2012 at 7:48
Original comment by richard.eckart
on 7 Jan 2013 at 4:51
Original issue reported on code.google.com by
richard.eckart
on 8 Jan 2012 at 3:25