datacleaner / DataCleaner

The premier open source Data Quality solution
GNU Lesser General Public License v3.0
599 stars 181 forks source link

ExecuteJobWithoutAnalyzersDialog does not understand output datastreams #744

Closed ClaudiaPHI closed 8 years ago

ClaudiaPHI commented 9 years ago

Scenario:

  1. Start a job with orderDB as datastore
  2. Add on canvas Customers and Employees table
  3. Add a "Union " and select all column
  4. Press Execute button
  5. "Execute without analyzers" window appears.
  6. Choose "Write an Excel spreadsheet" or "Write to CSV"

Exception

The following error:

Component has input columns from multiple tables: ImmutableAnalyzerJob[name=null,analyzer=Create Excel spreadsheet]

java.lang.IllegalStateException: Component has input columns from multiple tables: ImmutableAnalyzerJob[name=null,analyzer=Create Excel spreadsheet]
    at org.datacleaner.job.runner.RowProcessingPublishers.getTables(RowProcessingPublishers.java:219)
    at org.datacleaner.job.runner.RowProcessingPublishers.registerRowProcessingPublishers(RowProcessingPublishers.java:233)
    at org.datacleaner.job.runner.RowProcessingPublishers.registerJob(RowProcessingPublishers.java:143)
    at org.datacleaner.job.runner.RowProcessingPublishers.registerAll(RowProcessingPublishers.java:126)
    at org.datacleaner.job.runner.RowProcessingPublishers.<init>(RowProcessingPublishers.java:122)
    at org.datacleaner.job.runner.AnalysisRunnerJobDelegate.run(AnalysisRunnerJobDelegate.java:103)
    at org.datacleaner.job.runner.AnalysisRunnerImpl.run(AnalysisRunnerImpl.java:90)
    at org.datacleaner.util.AnalysisRunnerSwingWorker.doInBackground(AnalysisRunnerSwingWorker.java:57)
    at org.datacleaner.util.AnalysisRunnerSwingWorker.doInBackground(AnalysisRunnerSwingWorker.java:38)
    at javax.swing.SwingWorker$1.call(SwingWorker.java:295)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at javax.swing.SwingWorker.run(SwingWorker.java:334)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
ClaudiaPHI commented 9 years ago

I believe, we need to set as input columns the outputStream columns

LosD commented 9 years ago

Yeah, I guess it's the automated adder that knows nothing about neither multiple tables, nor output data streams, so it just adds all available columns from the top job builder.

LosD commented 9 years ago

Certainly not a duplicate, but may be related closely enough that @kaspersorensen would like to do look into it at the same time as #706?

ClaudiaPHI commented 9 years ago

Right on target @LosD

kaspersorensen commented 9 years ago

Hmm. I see the point... Good to have on record probably but I don't think this is the most dire issue right now at least.

ClaudiaPHI commented 9 years ago

ok. I will do more testing then.

LosD commented 8 years ago

Still doing weird stuff, now the error is just a repeated NPE instead:


java.lang.NullPointerException
    at org.datacleaner.components.fuse.FuseStreamsComponent.run(FuseStreamsComponent.java:119)
    at org.datacleaner.api.MultiStreamComponent.transform(MultiStreamComponent.java:38)
    at org.datacleaner.job.runner.TransformerConsumer.consumeInternal(TransformerConsumer.java:107)
    at org.datacleaner.job.runner.AbstractRowProcessingConsumer.consume(AbstractRowProcessingConsumer.java:159)
    at org.datacleaner.job.runner.ConsumeRowHandlerDelegate.consume(ConsumeRowHandlerDelegate.java:64)
    at org.datacleaner.job.runner.ConsumeRowHandler.consumeRow(ConsumeRowHandler.java:146)
    at org.datacleaner.job.tasks.ConsumeRowTask.execute(ConsumeRowTask.java:51)
    at org.datacleaner.job.concurrent.TaskRunnable.run(TaskRunnable.java:61)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Oh, and there's two CSV writers, but I guess that's not a surprise when it only knows about default scope.

jhorcicka commented 8 years ago

This does not seem to be reproducible anymore. Confirmed by @ClaudiaPHI and me.