dkpro / dkpro-tc

UIMA-based text classification framework built on top of DKPro Core and DKPro Lab.
https://dkpro.github.io/dkpro-tc/
Other
34 stars 19 forks source link

Update to DKPro Core 1.9.0 #423

Closed Horsmann closed 6 years ago

Horsmann commented 6 years ago

@reckart Updating to 1.9.0-SNAPSHOT led to new error.

We write overall reports as excel and CSV file. When writing the CVS file I know get this error:

org.dkpro.lab.engine.LifeCycleException: org.springframework.dao.DataAccessResourceFailureException: The maximum length of cell contents (text) is 32,767 characters; nested exception is java.lang.IllegalArgumentException: The maximum length of cell contents (text) is 32,767 characters
    at org.dkpro.tc.examples.regression.pair.SemanticTextSimilarityDemoTest.testTrainTest(SemanticTextSimilarityDemoTest.java:69)
Caused by: org.springframework.dao.DataAccessResourceFailureException: The maximum length of cell contents (text) is 32,767 characters; nested exception is java.lang.IllegalArgumentException: The maximum length of cell contents (text) is 32,767 characters
    at org.dkpro.tc.examples.regression.pair.SemanticTextSimilarityDemoTest.testTrainTest(SemanticTextSimilarityDemoTest.java:69)
Caused by: java.lang.IllegalArgumentException: The maximum length of cell contents (text) is 32,767 characters
    at org.dkpro.tc.examples.regression.pair.SemanticTextSimilarityDemoTest.testTrainTest(SemanticTextSimilarityDemoTest.java:69)

This did not happen in 1.8.0 ? I can't really tell why. The easy solution would be to just not create the CSV file anymore but I wonder what did actually change? Any ideas?

reckart commented 6 years ago

Well, what is the content of the overlong cell?

Horsmann commented 6 years ago

The parameter configurations and results

i.e.

{org.dkpro.tc.core.task.InitTask|featureMode=java.lang.Object@796ed904, org.dkpro.tc.ml.weka.task.WekaTestTask|labelTransformationMethod=java.lang.Object@796ed904, org.dkpro.tc.core.task.ExtractFeaturesTask|featureMode=java.lang.Object@796ed904, org.dkpro.tc.core.task.MetaInfoTask|featureSet=java.lang.Object@796ed904, org.dkpro.tc.core.task.MetaInfoTask|recordContext=java.lang.Object@796ed904, MeanAbsoluteError=java.lang.Object@796ed904, org.dkpro.tc.core.task.ExtractFeaturesTask|filesRoot=java.lang.Object@796ed904, org.dkpro.tc.ml.weka.task.WekaTestTask|attributeEvaluator=java.lang.Object@796ed904, org.dkpro.tc.core.task.OutcomeCollectionTask|readerTest=java.lang.Object@796ed904, org.dkpro.tc.core.task.InitTask|featureSet=java.lang.Object@796ed904, SpearmanCorrelation=java.lang.Object@796ed904, org.dkpro.tc.core.task.InitTask|readerTrain=java.lang.Object@796ed904, org.dkpro.tc.core.task.ExtractFeaturesTask|featureSet=java.lang.Object@796ed904, org.dkpro.tc.ml.weka.task.WekaTestTask|featureMode=java.lang.Object@796ed904, org.dkpro.tc.ml.weka.task.WekaTestTask|featureSearcher=java.lang.Object@796ed904, org.dkpro.tc.core.task.InitTask|developerMode=java.lang.Object@796ed904, org.dkpro.tc.core.task.ExtractFeaturesTask|learningMode=java.lang.Object@796ed904, org.dkpro.tc.ml.weka.task.WekaTestTask|threshold=java.lang.Object@796ed904, org.dkpro.tc.core.task.InitTask|learningMode=java.lang.Object@796ed904, org.dkpro.tc.core.task.MetaInfoTask|filesRoot=java.lang.Object@796ed904, org.dkpro.tc.core.task.ExtractFeaturesTask|developerMode=java.lang.Object@796ed904, org.dkpro.tc.core.task.ExtractFeaturesTask|featureFilters=java.lang.Object@796ed904, PearsonCorrelation=java.lang.Object@796ed904, RootMeanSquaredError=java.lang.Object@796ed904, org.dkpro.tc.ml.weka.task.WekaTestTask|numLabelsToKeep=java.lang.Object@796ed904, org.dkpro.tc.ml.weka.task.WekaTestTask|classificationArguments=java.lang.Object@796ed904, org.dkpro.tc.core.task.MetaInfoTask|featureMode=java.lang.Object@796ed904, org.dkpro.tc.core.task.InitTask|threshold=java.lang.Object@796ed904, org.dkpro.tc.core.task.InitTask|readerTest=java.lang.Object@796ed904, org.dkpro.tc.ml.weka.task.WekaTestTask|learningMode=java.lang.Object@796ed904, org.dkpro.tc.core.task.ExtractFeaturesTask|applyWeighting=java.lang.Object@796ed904, org.dkpro.tc.ml.weka.task.WekaTestTask|applySelection=java.lang.Object@796ed904}

I don't doubt that this may be longer than this 32k something chars but this test did pass before the update. Is this configurable how long a line can be?

reckart commented 6 years ago

Does this file include a string version of the type system? The 1.9.0 type system is a bit larger than the 1.8.0 type system.

reckart commented 6 years ago

Why are all the values like java.lang.Object@xxxxx? Is that really how it should be?

Horsmann commented 6 years ago

I copied that from the debugger these values are resolved to string when it is actually written. They do not contain the typesystem but all kinds of string values that are used in the dimension of lab

i.e. org.dkpro.tc.core.task.deep.InitTaskDeep|dimEmbedding .../Downloads/embeds/poly_a/en.polyglot.txt and so on.

Which does not explain why this problem shows up now. I fixed this problem by simply not printing these anyway redundant .csv files anymore but I am still wondering what has changed.

reckart commented 6 years ago

Hard to tell without knowing which value exactly is longer than 32k... another possibility could be that a change in Maven dependencies triggers the problem... no idea.

Horsmann commented 6 years ago

It prints the collection reader with toString (9900) chars i.e.

org.apache.uima.collection.impl.CollectionReaderDescription_impl: 
externalResourceDependencies = Array{}

frameworkImplementation = org.apache.uima.java
implementationName = org.dkpro.tc.examples.io.STSReader
metaData = org.apache.uima.resource.metadata.impl.ProcessingResourceMetaData_impl: 
UUID = NULL
asynchronousModeSupported = false
....

If can catch this string with a simple startsWith() call but I would have to do this in the FlexTable class. This is extremely dirty but I don't see any other way to catch this string. ReportUtils seems to write the whole context to file.

reckart commented 6 years ago

You could overwrite org.dkpro.lab.reporting.FlexTable.getValueAsString(String, String).

reckart commented 6 years ago

... I mean e.g. in an anonymous inner class where you create the table that is eventually passed to ReportUtils.

Horsmann commented 6 years ago

How do I find out why the build failed if Jenkins does not show me anything except the message that somewhere an error occurred https://basa.ltl.uni-due.de:33000/job/DKPro%20TC/625/console (does not occur in my local workspace)

reckart commented 6 years ago

If there are javadoc errors, search for : error::

 [ERROR] /var/lib/jenkins/workspace/DKPro TC/dkpro-tc-core/src/main/java/org/dkpro/tc/core/util/ReportUtils.java:211: error: reference not found
[ERROR] * Looks into the {@link FlexTable} and outputs general performance numbers if available
Horsmann commented 6 years ago

merci

Horsmann commented 6 years ago

@reckart the -core module of tc doesn't build anymore but I see again no error message? https://basa.ltl.uni-due.de:33000/job/DKPro%20TC/639/console

Locally the build works :/

reckart commented 6 years ago

Eclipse generates different class files than the compiler used by Maven. When you do a local command line build, do you "clean install" or only "install"? In the first case, you should get the same results as with Jenkins. In the latter case, you'll mosts likely use the class files generated by Eclipse.

If you update to the latest Maven dependency plugin or to the latest DKPro parent pom, you'll also see that changes need to be made to the usedDependencies sections as it seems the dependency detection mechanism in the Maven plugin has been improved/changed.

reckart commented 6 years ago

Btw. here is the error message from your log:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.10:analyze-only (default-cli) on project dkpro-tc-core: Cannot analyze dependencies: Trying to force use of dependencies which are declared but already detected as used: [de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.tokit-asl] -> [Help 1]
Horsmann commented 6 years ago

Thx. What is the issue with UKP Jenkins? All tests fail o.0

reckart commented 6 years ago

My guess would be that Jenkins had some trouble downloading JARs and got them somehow corrupted... if this persists, I can try cleaning out the Maven cache.

Horsmann commented 6 years ago

@reckart could do that it doesn't look like UKP Jenkins is recovering by its own

reckart commented 6 years ago

We have had some trouble with servers here in general and are still recovering from that. Once they are back, I'll have a look at it. At present, I cannot even access Jenkins via the browser.

reckart commented 6 years ago

Looks like the build is accidentally trying to use uimaFIT 3.0.0-SNAPSHOT, probably via a bad DKPRo Core 1.9.0-SNAPSHOT build. I'll look into it further.

reckart commented 6 years ago

@Horsmann the DKPro TC build on the now fails due to a compiling problem in the DL4J code. I guess this is related to your latest commit?

Horsmann commented 6 years ago

yep forget to update the examples. Our jenkins fails now entirely saying it doesn't find 1.9.0

https://basa.ltl.uni-due.de:33000/job/DKPro%20TC/664/console

reckart commented 6 years ago

Did you configure your Jenkins with a settings.xml file that forces Maven to use a particular mirror? I have added repository a declaration for the DKPro Core 1.9.0 staging repo, so it should find the artifacts there.

Horsmann commented 6 years ago

I removed the settings.xml again everything used to run without before. I saw the repo you added but its seems to be the same issue I had before. The repo is not used.

reckart commented 6 years ago

Odd...

On our Jenkins, I temporarily removed the maven build arguments ("clean install blah blah") and just put a "dependency:purge-local-repository" and ran one build to clean up the locally cached dependencies. After that, I restored the usual build arguments. Might be worth a try?

Horsmann commented 6 years ago

I just did that but did not help either

Horsmann commented 6 years ago

I assume this could be a firewall issue. Seems they forgot to allow port :443 Lets see if UKP Jenkins build - we can then close this issue and just wait until core 1.9.0 is released

Horsmann commented 6 years ago

@reckart seems to work. can you give me a pointer when core is released? I will remove the repository then from TC and then our Jenkins should build, too.

reckart commented 6 years ago

@Horsmann I have promoted the DKPro Core 1.9.0 release to Maven Central.