dkpro / dkpro-jwpl

DKPro JWPL (DKPro Java Wikipedia Library) is a free, Java-based application programming interface that facilitates access to all information in Wikipedia.
https://dkpro.github.io/dkpro-jwpl
Apache License 2.0
83 stars 35 forks source link

[DataMachine] Automatically delete *.bin files after dump creation process #24

Closed daxenberger closed 1 year ago

daxenberger commented 9 years ago

Originally reported on Google Code with ID 26

The files page.bin, revision.bin and text.bin, which are generated during the dump creaton
process, should be automatically deleted when processing has finished. They are no
longer needed.

Reported by oliver.ferschke on 2011-06-08 12:32:03

daxenberger commented 9 years ago

Reported by oliver.ferschke on 2011-06-08 12:32:17

daxenberger commented 9 years ago

Reported by oliver.ferschke on 2011-10-07 10:55:04

daxenberger commented 9 years ago

Reported by oliver.ferschke on 2011-10-07 10:55:17

mawiesne commented 6 years ago

@daxenberger Has this issue ever been resolved? What is the current state for this feature request?

It could be quite a useful enhancement, saving users from the experience of a full disk.

daxenberger commented 6 years ago

@mawiesne see my comments in #173

daxenberger commented 6 years ago

@daxenberger I agree: .bin files are not of much interest to most (>90%) users. Please specify an optional parameter in the original issue (#24) as a comment. I could then rework the PR to reflect this idea, so users can decide whether to keep the files or not.

How about sth like a boolean DO_NOT_DELETE_TEMP_FILES in JWPLDataMachine, set to false by default?

rzo1 commented 6 years ago

What about the command-line parameter "--keep-bin-files <true/false>" ?

daxenberger commented 6 years ago

What about the command-line parameter "--keep-bin-files <true/false>" ?

Fine. I suggest to make the command-line parameter optional - using "keep-bin-files" to enable it.

mawiesne commented 6 years ago

@reckart Any additions? Otherwise, I'll proceed with these changes as suggested above.

reckart commented 6 years ago

I don't know how this code is used, but usually having static flags somewhere like a DO_NOT_DELETE_TEMP_FILES is not a good idea. I'd still recommend allowing local control over this behavior. But that's just my 10 cents.

mawiesne commented 6 years ago

I will quickly summarize our options here:

  1. Hard removal strategy as originally requested "on Google Code with ID 26", current state of PR #173, no CLI parameter, just free a lot of disk space.
  2. Soft removal strategy, unless -keep-bin-files is specified, proposed by @daxenberger / @rzo1 .
  3. Local control strategy with removal option, i.e. keep all .bin files by default unless -wipe-bin-files is specified, proposed by @reckart.
  4. Do not change code strategy and close this issue / remove the PR all together.

Gentlemen, please cast your vote (+1, 0, -1) for one of the aforementioned strategies. -1 in case you want to veto providing an explanation on the "why".

I will vote, once you have commented and implement the strategy which receives a majority of the votes.

mawiesne commented 6 years ago

@tgalery Maybe you want to participate in the vote on this implementation strategy as well? Asking you, just in case..

tgalery commented 6 years ago

Hi guys , I'd love to, but I'm on holidays at the moment. Should be able to catch up and see how I can contribute. Hope that's ok !

On Mon, 9 Jul 2018, 21:26 Martin Wiesner, notifications@github.com wrote:

@tgalery https://github.com/tgalery Maybe you want to participate in the vote on this implementation strategy as well? Asking you, just in case..

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dkpro/dkpro-jwpl/issues/24#issuecomment-403575333, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwssLOEtvkKgSxHgcYTXc8cN7t-zjbQks5uE6BfgaJpZM4VGYUd .

reckart commented 6 years ago

I have given my advice, but for me it's really up to you how you prefer it. So I'm +0 for all options :)

daxenberger commented 6 years ago

Both option 2 and 3 are fine with me, with a slight preference for 2.

reckart commented 6 years ago

Local control strategy with removal option, i.e. keep all .bin files by default unless -wipe-bin-files is specified, proposed by @reckart.

Mind my comment was about the API of the dumper at the Java level, not at the command-line level. I believe it should be possible (if desired) to call a "cleanup" method from the code that handles the command-line invocation (e.g. some main method). I didn't comment on how that could be exposed to the CLI user (i.e. whether to use a delete-by-default or a keep-by-default strategy). IMHO the CLI-level design and Java API level design can very well make different decisions on this matter. That's just for clarification. I'm still +0 for everything ;)

mawiesne commented 6 years ago

@reckart Ack, thanks for the clarification. Your initial comment wasn't clear enough then: "local control" interpreted as leave it to the caller/user of DataMachine (from the CLI perspective). @daxenberger In case we go for option 2 / 3. I will reflect the last comment by @reckart on API level.

rzo1 commented 6 years ago

I vote for a combined approach of (2) and (3).

Implementing "cleanUp()" on API level and then using ---keep-bin-files to prevent cleanup in CLI usage.

daxenberger commented 6 years ago

Implementing "cleanUp()" on API level and then using ---keep-bin-files to prevent cleanup in CLI usage.

Sounds good!

mawiesne commented 6 years ago

Decided it is, as nobody objected the proposal by @rzo1. I will implement a soft-removal strategy (Option 2 for CLI perspective) then, providing cleanUp method on API level.