Closed gpotts closed 9 years ago
Did a checkout and build of the latest pentaho data-integration 5.4 they are working. I did some tests that might indicate that the problem has been fixed. I would need to hook it up to the AMQP Rabbit queue to see if it truly has been fixed. There are currently 2 problems. First, they are using TRUNK-SNAPSHOT dependencies indicating that this is still their developement type release and second, they removed a class that I was using for the Hadoop configuration for defining HDFS, ... etc clusters. Will have to slightly refactor before I can use the new code.
I will keep moving on the current baseline and then go through the upgrade/testing a little later after the Geopackage Cuts have been completed
Looking at their site. Looks like they have now release version 5.4. I'll finish up the cutter and then work on upgrading to see if a number of the flow bugs have been resolved and can close these out.
Just tried their latest distribution in 5.4. I originally thought the problem was resolved but further testing the problem is still present. I'll have to keep this bug report here until I can patch their baseline
I have issued a pull request that will fix this issue to their latest release 5.4. If not accepted I will have to patch our own 5.3 and rebuild and use that as the distribution. I'll close this out once I get feedback from the data-integration developers.
I might have to patch the 5.3 distribution anyways so our current 5.3 data-integration works properly
I went ahead and patched out 5.3 and built the kettle-engine for that version. Verified that it works. I hope they take the mod and patch their baseline or I'll have to keep applying the patch on every release
Within the data-integration environment there are bugs within the job executor and the trans executor steps causing an off by one to occur for blocks of rows to be processed. If N is the number of rows to send to a transform or a job then it will not get sent until either the N+1 row is found or if the privies step finishes.
This causes steps that continuously generate rows forever to not work properly
This would have to be fixed on the data-integration environment by forking their code and fixing the step and then issue a pull request.