Closed GoogleCodeExporter closed 8 years ago
Hi Jan!
well, in fact it currently already only generates documentation only for
changed / added files (and it will delete docs for deleted files). It's
currently not yet set up for the target directories though - the
create-output-subdirs job is always executed for all subdirs, even if they
exist already.
From what I can see, in an unmodified run, attempting to create dirs that
already exist seems to take most of the time.
So, my proposal would be to fix that first - that way, the job will run
onsiderably faster when there is nothing todo
Original comment by roland.bouman
on 20 Oct 2010 at 12:05
Practically all my code is organised in sub-directories. Guess that why I
didn't notice the process is incremental already. Anyhow, I need to implement a
piece of code as described above for KFF. I need to make sure that back-up only
happens when code has changed :-) I guess we could use the same logic
(backward compatible to 3.2.x)
Original comment by jan.aertsen
on 20 Oct 2010 at 6:28
Jan, yes, absolutely :)
I fear the current transformation may not be entirely clean enough for a
generic reusable transformation, but you can certainly get a headstart by
copying process-files.ktr and throwing out what you don't need.
The logic is:
1) use "Get subfolder names" step to fetch directories. this is available in
kettle 3.2, and unlike the "Get filenames" step does have an option to recurse
subdirs
2) have the subfolder outputstream kick off a get filenames step.
3) Do step 1 and 2 both for the source dir and for the target dir, and use a
"Merge diff" step to compare relative path and (short) filename. This will
identify deleted and new files for free. Updated and unmodified files show up
as identical and need further processing to discenr between updates and
unmodified files.
4) In the stream that shows up as "identical" in the diff output, use "stream
lookup" steps to fetch data for source file and target file.
5) Use a filter to compare last modified time of source and target files to see
if a file is updated or unmodified.
Thats it :)
Original comment by roland.bouman
on 20 Oct 2010 at 6:45
Ok - I modified process-files to only output direcories that do not yet exist
in the output dir. This should shave off a number of seconds from the execution
time in case you have a large number of directories.
Right now I don't have time to make the solution really clean and reusable, but
at some point in the future I should probably cut up process-files in a number
of jobs. When I do that I should probably also do an incremental update of the
template directory, just because we can :)
For now I'm moving on to implementing new features.
Original comment by roland.bouman
on 20 Oct 2010 at 7:39
Original issue reported on code.google.com by
jan.aertsen
on 19 Oct 2010 at 10:20