Anton87 / uimafit

Automatically exported from code.google.com/p/uimafit
0 stars 0 forks source link

create initialize package #20

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I think it makes sense to rename the classes InitializeUtil and 
ExternalResourceConfigurator and put them together in an "injection" package.

My suggestion is:

org.uimafit.injection.ConfigurationParameterInjection
org.uimafit.injection.ExternalResourceInjection

I'll admit that I am not perfectly clear on all the nuance of the word 
"injection" and what it means to all developer communities.  But, I think we've 
decided what it means to us and I think its ok to run with the it.  

Any thoughts?

Original issue reported on code.google.com by pvogren@gmail.com on 10 Jun 2010 at 4:16

GoogleCodeExporter commented 8 years ago
+1 for moving them into a common package. How about moving them into the 
component package. Thinking a bit that seems the most logical place for me and 
it is where they are mainly used. 

As for renaming - I like to think that both are conceptually quite similar to 
Spring's BeanPostProcessors. For that reason I would suggest to rename them to 
ConfigurationParameterProcessor and ExternalResourceProcessor. 

Original comment by richard.eckart on 10 Jun 2010 at 4:28

GoogleCodeExporter commented 8 years ago
I like processor better since its a nominalization.  One reason I like an 
"injection" package is because its the I in FIT.  But I could go either way on 
this one.  

Original comment by pvogren@gmail.com on 10 Jun 2010 at 4:43

GoogleCodeExporter commented 8 years ago
I think its fine to leave dependency injection as a concept on the 
methodological level and have "processor" as a name for a concept in the 
concrete implementation. But I suppose I mainly like it because it makes it 
easier to draw parallels to the nomenclature world set up in the Spring 
framework - makes it easier to mentally map between the two.

Original comment by richard.eckart on 10 Jun 2010 at 4:48

GoogleCodeExporter commented 8 years ago
I have no such connection with Spring and absent that I think "processor" is a 
pretty meaningless word.  Again, I'm not against it and the working suggestions 
(ConfigurationParameterProcessor and ExternalResourceProcessor) are much 
clearer than we have now.  

One reason I created this ticket is because Chris couldn't find the 
functionality provided by them.  I am concerned that if they are called 
processor and mixed in with the factories, then they may be harder to find.  
So, I would vote to put them in a separate package if we can come up with a 
name for it ("injection"?)  Another reason another package might be appropriate 
is because I actually had a different intuition about which existing package 
they might go in - component.  

I will cc Chris and see if he has an opinion.

Original comment by pvogren@gmail.com on 10 Jun 2010 at 7:51

GoogleCodeExporter commented 8 years ago
I wouldn't put them with the factories either, but rather with the components 
in the components package or possible in a processor sub-package. 

We could also call them "initializers", but I am not sure if that gives a 
conceptual clash with the new initializable interface - maybe not. After all, 
the main use case for those classes is to be called from within a components 
initialize() method.

Original comment by richard.eckart on 10 Jun 2010 at 8:03

GoogleCodeExporter commented 8 years ago
Thanks Philip. I think a combination of a bit more patience on my part and a 
roadmap to uimaFIT would help make it easier to find things.

Regarding the name, I don't know that I have much to add that doesn't slide 
down the path of defining what OO is, what it means to naming and why I loathe 
classes full of static functions with names that end in "util", "manager" or 
"processor". It shows a functional, not OO view.

I like that InitializeUtil is used to extend the JCasAnnotator_ImplBase. 
(though I would prefer a different name since I don't always scan imports) 
These two classes are about setting up the metadata for an annotator, and in my 
mind, in an OO setting, I look for functions, packages and their names in the 
taxonomy of things...of the data, rather than the functions.  Often this means 
extending existing classes and in Java we don't have either multiple or private 
inheritance, so you create mixins like InitializeUtil. They're mostly 
functions, so what are you gonna do? If you lengthened the name to answer the 
question "Initialize *what*?" you might name it InitializeParametersUtil. 
ExternalResourceUtil is vague on the functional 
end....InitializeExternalResourceUtil?

from the code: 
     * Configure a component from the given context.
    public static <T> void configure(UimaContext aContext, T object)
Why isn't configure() a function on the component T class?

Original comment by croeder6000 on 10 Jun 2010 at 8:38

GoogleCodeExporter commented 8 years ago
Regarding the code snipped: that is because Java does support neither mixins 
nor multiple inheritance. We'd have to re-implement a bunch of things that 
happen on the way from the UIMA component interfaces way down to when they 
become e.g. JCasAnnotator_ImplBase in order to have a common base class from 
which we could inherit a configure() method.

The current approach resembles a minimal invasive scenario to bring in new 
functionality - I believe none of us could come up with a more elegant 
solution, so we arrived at the same idea when looking at the issue.

Taking on the issue from a slightly different point of view (I like to think of 
it as the Spring point of view) is this: like Spring, UIMA is a component 
framework. A UIMA component is a bit like a Java bean (used by Spring as 
components), except that it cannot be configured using getters and setters (the 
traditional attack vector for dependency injection), but needs to be configured 
via the initialize() method due to how UIMA is constructed. This can be 
compared to constructor-based dependency injection. Now, when Spring sets up a 
context, it initializes the Java beans defined in the context. After they are 
initialized, all registered BeanPostProcessors are applied to them in order to 
do dependency injection, evaluate annotations, apply aspect and the like. 
Transferring the concept to uimaFIT, we currently have two such processors - 
InitializeUtil and ExternalResourceConfigurator- and we do not (like Spring) 
provide a framework for adding more of them. So in this point of view the 
BeanProcessors are a mechanism provided by the IOC container responsible for 
the dependency injection.

In our case UIMA is actually the IOC container. However, it does not provide a 
lifecycle callback mechanism which would allow us to hook into the component 
creation lifecycle and execute our post processors after the component has been 
instantiated - the only way to do this is to hook into the initialize() method 
of components. This is why I have created the JCasAnnotator_ImplBase class and 
friends. Since our processors are stateless, they are realized as static 
methods. We could also have them as proper classes and call e.g. new 
ExternalResourceProcessor().process(this). Mixing them into the base classes is 
not possible in Java and in most cases not desirable, since we'd only add new 
methods to the component classes that would never be used outside the 
initialize() method - we'd only pollute the auto-completion feature of IDEs.

One of the charms of uimaFIT is that is full of static methods. It makes using 
it very straight forward in the majority of cases. It kind of provide those 
constructors, that should have already been provided by UIMA and constructors 
are static - as well as factory methods may be. Static methods may also prove a 
draw back once we wish to work more with the UimaContext class - I am not sure 
of this yet. The new InitializableFactory may address this issue.

Maybe this clears up a bit why things are as they are. I suppose I would prefer 
if we did not have only static methods, but rather a set of factories. But if'd 
spread things across multiple factories - as we do now - we'd end up in a very 
entangled place. We'd probably have to put everything into one factory. But 
anyway, we can't really refactor everything to be non-static anymore. That 
option is long gone. 

Original comment by richard.eckart on 10 Jun 2010 at 9:11

GoogleCodeExporter commented 8 years ago
thanks for a clear explanation. As someone who spends as little as 1/3 of his 
time on UIMA, having a good base class to start with make life simple. Much as 
I'd like to, I don't have the time to read more UIMA and uimaFIT code. While I 
see the value in XML descriptor files, I loathe writing them as much as Philip. 
I'm working towards writing or finding an annotator base that lets me:

- use annotated parameters and not have to write code to  "mine" the context to 
initialize them as in our paper from last summer (done in 
uimafit.component.JCasAnnotator_ImpleBase.initailize())

- use those same annotated parameters to have code that creates the XML 
descriptor file.  I create a main() in the annotator's code that creates the 
file when run.
public void generateXMLDescriptorFile()    throws IOException, SAXException {
       String className = myClass.getName();
       className = className.substring(className.lastIndexOf(".") + 1) + ".xml";
       File file = new File(className);
       AnalysisEngineDescription aed = getPrimitiveDescription();

       aed.toXML(new java.io.FileWriter(file));
   }
public AnalysisEngineDescription getPrimitiveDescription()  {   
        AnalysisEngineDescription aed  =null;
        TypeSystemDescription typeSystem =
            TypeSystemDescriptionFactory.createTypeSystemDescription(
                   "typeSystem.CCPTypeSystem");
        ConfigurationData configData = 
           ConfigurationParameterFactory.createConfigurationData(myClass);
        try {
            aed = AnalysisEngineFactory.createPrimitiveDescription(
                    myClass, typeSystem,  configData);

- create an annotator for use in a uimaFIT pipeline, getting it's default 
values from the annotations and specific, overriding values from the function 
call
    AnalysisEngine createAnalysisEngineWithDefaultValues(           
            List<String> resourceNames,
            List<String> resourceValues)  
    throws ResourceInitializationException, InvalidXMLException {
        AnalysisEngineDescription aed =  
            getPrimitiveDescription();
        addExternalResourcesToAED(aed, resourceNames, resourceValues);
        AnalysisEngine testAE = 
            AnalysisEngineFactory.createPrimitive(aed);
        return testAE;

- include resources in that initialization
    void addExternalResourcesToAED(AnalysisEngineDescription aed, 
            List<String> resourceNames,
            List<String> resourceValues) 
    throws InvalidXMLException {
        assert(resourceNames.size() == resourceValues.size());
        Collection<ExternalResourceDependency> deps = new ArrayList<ExternalResourceDependency>();
        for (String s : resourceNames) {
            ExternalResourceDependency localFileDependency = 
            ExternalResourceFactory.createExternalResourceDependency(
                    s, Resource.class, false);
            deps.add(localFileDependency);
        }
        aed.setExternalResourceDependencies(deps.toArray(new ExternalResourceDependency[deps.size()]));
        for (int i=0; i< resourceNames.size(); i++) {
            ExternalResourceFactory.bindResource(aed, resourceNames.get(i), resourceValues.get(i));
        }
    }

Is there room for this much in  uimafit.component.JCasAnnotator_ImpleBase?

Original comment by croeder6000 on 10 Jun 2010 at 10:31

GoogleCodeExporter commented 8 years ago
It's unforunate that this message board does not support syntax highlightning. 
I'll reply to this in parts.

--

- use those same annotated parameters to have code that creates the XML 
descriptor file.  I create a main() in the annotator's code that creates the 
file when run.

public void generateXMLDescriptorFile()    throws IOException, SAXException {
       String className = myClass.getName();
       className = className.substring(className.lastIndexOf(".") + 1) + ".xml";
       File file = new File(className);
       AnalysisEngineDescription aed = getPrimitiveDescription();

       aed.toXML(new java.io.FileWriter(file));
   }

public AnalysisEngineDescription getPrimitiveDescription()  {   
        AnalysisEngineDescription aed  =null;
        TypeSystemDescription typeSystem =
            TypeSystemDescriptionFactory.createTypeSystemDescription(
                   "typeSystem.CCPTypeSystem");
        ConfigurationData configData = 
           ConfigurationParameterFactory.createConfigurationData(myClass);
        try {
            aed = AnalysisEngineFactory.createPrimitiveDescription(
                    myClass, typeSystem,  configData);

--

You should use static imports such as:

import static ...uimafit.factory.AnalysisEngineFactory.*;

This will make the static methods "appear" immediately in your class - almost 
if it was a mixin!

Your two methods can be shorter. I even wonder if you really want a method to 
generate a descriptor file - maybe for properly closing the writer - not sure 
if toXML does that. 

public void generateXMLDescriptorFile(Class aClass)    throws IOException, 
SAXException {
       aed.toXML(new java.io.FileWriter(aClass.getSimpleName()+".xml));
   }

public AnalysisEngineDescription getPrimitiveDescription()  {
    TypeSystemDescription tsd = createTypeSystemDescription("typeSystem.CCPTypeSystem");
    return createPrimitiveDescription(Annotator.class, tsd, 
        Annotator.PARAM_X, "value X", 
        Annotator.PARAM_Y, "value Y", 
        ...);
}

Anyway, I would suggest to have that as a utility method and not inside every 
component. The annotations in the component should contain all the necessary 
code and all the default values to derive a default descriptor. So my 
suggestion would be to write something like this when you need it - not inside 
the Annotator:

Class aClass = Annotator.class;
createPrimitiveDescription(aClass, 
createTypeSystemDescription("typeSystem.CCPTypeSystem")).
    toXML(new FileWriter(aClass.getSimpleName()+".xml"));

Did I overlook anything?

Original comment by richard.eckart on 10 Jun 2010 at 10:50

GoogleCodeExporter commented 8 years ago
- create an annotator for use in a uimaFIT pipeline, getting it's default 
values from the annotations and specific, overriding values from the function 
call
    AnalysisEngine createAnalysisEngineWithDefaultValues(           
            List<String> resourceNames,
            List<String> resourceValues)  
    throws ResourceInitializationException, InvalidXMLException {
        AnalysisEngineDescription aed =  
            getPrimitiveDescription();
        addExternalResourcesToAED(aed, resourceNames, resourceValues);
        AnalysisEngine testAE = 
            AnalysisEngineFactory.createPrimitive(aed);
        return testAE;

--

It seems to me that your code is trying to work around the fact that 
ExternalResourceFactory.bindResource() does not work unless you either prepare 
the AED using ExternalResourceFactory.createExternalResourceDependency() OR the 
component as @ExternalResource annotations. In your case, you don't seem to 
have the annotations in the component. I think the best way would be to add 
them - if the component source code is under your control. It should make your 
code much simpler. When you get an AED from a properly annotated component 
using e.g. createPrimitiveDescription() all you should need to do is call 
bindResource() to get your external resources bound.

The code should look like this:

TypeSystemDescription tsd = ...;
AnalysisEngineDescription aed = createPrimitiveDescription(Annotator.class, 
tsd, PARAMETERS...);

If you code your external resources and components in the right way, something 
like this should do to bind an external resourcem e.g. a hypothetical 
dictionary resource:

bindResource(aed, DictionaryResource.class, DictionaryResource.PARAM_FILE, 
"path/to/some/file");

In your UIMA annotator component you'd have such lines:

@ExternalResource
DictionaryResource dictionary;

The more advanced functions of the ExternalResourceFactory and the 
@ExternalResource annotations are only needed when you want to have multiple 
resources of the same type in your component or if you need to work with UIMA 
components that do not use the @ExternalResource annotation.

Original comment by richard.eckart on 10 Jun 2010 at 11:10

GoogleCodeExporter commented 8 years ago
All in all, I would put nothing of your suggested code into an annotator class 
and also not into one or all of the base classes. The UIMA component classes 
(readers, annotators) should contain no code at all, other than what is needed 
for doing the actual processing.

Original comment by richard.eckart on 10 Jun 2010 at 11:14

GoogleCodeExporter commented 8 years ago
One more: if you are using Eclipse, I would like to point out a very useful 
feature:

1) Open the preferences and navigate to Java/Editor/Content Assist/Favorites
2) press "new type" and browse to "org.uimafit.factory.AnalysisEngineFactory"
3) ok the browse and the new-type dialog

The static methods from the AnalysisEngineFactory will from now on be available 
in the auto-completion feature (pressing CTRL-space in the source code editor). 

You can add any classes with static methods here. You can also choose to add 
individual methods using "new member". Eclipse will automatically generate the 
static imports when you use the auto-complete feature.

This saves me a lot of writing and makes static imports really neat to use. 
It's kind of a poor-mans workaround to mixins ;) I usually have a couple of 
Apache commons classes and uimaFIT classes in that list.

Original comment by richard.eckart on 10 Jun 2010 at 11:28

GoogleCodeExporter commented 8 years ago
Interesting discussion - I don't have anything substantive to add.  Chris, I 
was only hoping for a little help with naming.  It's too bad this discussion 
will get buried in an unrelated issue - please consider posting to 
uimafit-users list next time.  

I like the suggestion of using "initializer".  How about the following:

org.uimafit.component.initialize.ConfigurationParameterInitializer
org.uimafit.component.initialize.ExternalResourceInitializer

There may be a little overlap with the Initializable but I think confusion can 
be avoided by adding a little more javadoc to that interface.

Original comment by pvogren@gmail.com on 11 Jun 2010 at 4:34

GoogleCodeExporter commented 8 years ago
Sounds good to me

Original comment by richard.eckart on 11 Jun 2010 at 8:22

GoogleCodeExporter commented 8 years ago
I made the name changes.  I called the methods 
initializeConfigurationParameters and initializeExternalResources.  I also 
cleaned up the javadoc for initializeConfigurationParameters which was a little 
out of date.  

Original comment by pvogren@gmail.com on 11 Jun 2010 at 9:10