WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2024, WikiTeam has preserved more than 600,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
714 stars 148 forks source link

With --resume configuration should be read from config.txt and arguments shouldn't be required #17

Open emijrp opened 10 years ago

emijrp commented 10 years ago

From nemow...@gmail.com on July 09, 2011 09:48:45

"Error. You forget mandatory parameters: [...]" But all parameters are stored in the config file.

Original issue: http://code.google.com/p/wikiteam/issues/detail?id=17

emijrp commented 10 years ago

From nemow...@gmail.com on July 22, 2011 01:20:56

Also, the path shouldn't be required it there's a subdirectory (and only one) with the standard name in the current directory. This would help when archiving a bunch of wikis, making shell scripts easier.

emijrp commented 10 years ago

From nemow...@gmail.com on February 29, 2012 15:38:17

Labels: -Type-Defect Type-Enhancement Usability

emijrp commented 10 years ago

From ad...@alphacorp.tk on November 11, 2013 22:38:54

Blocking: wikiteam:75

saper commented 9 years ago

When using it for a first time with `--resume`` (on an empty directory), it complains:

$ python dumpgenerator.py --xml --path /usr/home/saper/dump/exp --resume --force
Loading config file...
There is no config file. we can't resume. Start a new dump.

So I start from scratch:

$ python dumpgenerator.py --xml --path /usr/home/saper/dump/exp --force
Loading config file...
Warning!: "/usr/home/saper/dump/exp" path exists
There is a dump in "/usr/home/saper/dump/exp", probably incomplete.
If you choose resume, to avoid conflicts, the parameters you have chosen in the current session will be ignored
and the parameters available in "/usr/home/saper/dump/exp/config.txt" will be loaded.
Do you want to resume ([yes, y], [no, n])?

Let's try y:

No config file found. I can't resume. Aborting.

Why give a choice if it's not a choice?

I am not sure current directory-creation logic is intuitive, if a dump instance has it's own directory, then --resume can be assumed safely if there was some progress. Maybe we should provide --anew instead to force running from scratch.

nemobis commented 9 years ago

This is probably a consequence of a messed up directory due to the "-2" bug, which should be now fixed. Can you try again?

I'm not sure what you're trying to do, but why not remove the existing directory? Note, we have launcher.py if for some reason your wiki requires frequent restarting.

saper commented 9 years ago

No, it's unrelated - suspecting this I didn't use slash at the end :) The core problem behind this issue is that --path is reality only a "prefix" or "name pattern" (we could imagine something like --prefix ${HOME}/dump/wiki-%Y%m%d as well).

The current code tries to create a directory anyway and incorrectly assumes the process is resumable if just a directory (not a config) exists.

Maybe we could have

It might be the right thing to do for #33 as well. And probably we would not need #200 at all.

But I also don't get check the rationale for the -2 suffix really... (not yet)

nemobis commented 9 years ago

Our code to identify where to resume is in launcher.py: https://github.com/WikiTeam/wikiteam/blob/master/batchdownload/launcher.py#L64

I'm getting confused by your proposals. Is your aim to resume failed dumps, or to restart new dumps from scratch? Do you need to keep all the past directories in place?

PiRSquared17 commented 9 years ago

Thanks for your suggestions @saper. Feel free to submit a PR with your proposed changes.