aibasel / lab

Lab is a framework for evaluating planners and other solvers.
https://lab.readthedocs.io
GNU General Public License v3.0
30 stars 13 forks source link

clarifications for downward.experiments.DownwardExperiment documentation #12

Closed jendrikseipp closed 8 years ago

jendrikseipp commented 10 years ago

Original report by Malte Helmert (Bitbucket: malte, GitHub: maltehelmert).


Reading the documentation for the constructor, I have two questions:

  1. Does the description surrounding the "compact" attribute still match current reality, now that we take inputs and checkouts from the cache? I guess it still has some relevance for copying in PDDL files for translator runs, but other than that I'm somewhat confused. Can you clarify under which situations compact=False/True should be set? I hope that we can get rid of the argument in the long run, but this might require some additional caching.

  2. The description of "limits" seems incomplete to me. It doesn't mention the units for the limits, and it doesn't mention that it's allowed to only specify a subset of attributes. (The example shows this, but examples should not replace basic documentation, only complement it.)

Probably a separate issue, but I also think the API to specify a timeout in add_config and add_portfolio is inconsistent with the way limits are specified in the constructor, making it harder to learn.

It would make much more sense to me to have a general limits dict here also. At the very least, this would allow setting different memory limits for the search component. More importantly, now that configs and trans/pre/search revisions are mixed up in the add_config API, it's not clear that "timeout" refers to the search timeout only, and there should be an API to set the other limits, too.

However, as mentioned elsewhere, I think the add_config API can be left as is and eventually be deprecated, and the "limits" argument should be part of the new "add_algorithm" method.

jendrikseipp commented 10 years ago

Regarding 1:

compact=False means that the experiment should be portable. If it is really portable hasn't been tested in a while, I guess, but I don't see a reason why it shouldn't. If compact=False, all preprocessed tasks are copied into the experiment. For preprocess experiments, the PDDL files are copied into the run dirs unconditionally. All code revisions are compiled in the cache, but copied into the experiment dir unconditionally.

Should we change this behaviour and read PDDL files from the benchmarks dir and call the code revisions in the cache? Do we want to keep the "compact" option at all?

Regarding limits:

I agree that the limits should be set in the add_algorithm method and be deprecated in the constructor. Instead of the limits dict however, I would prefer to pass them as individual kwargs (i.e. exp.add_algorithm(..., search_rev="123", search_time_limit=1800, search_memory_limit=2*1014*1024)). Do you have a different preference?

jendrikseipp commented 10 years ago

Original comment by Malte Helmert (Bitbucket: malte, GitHub: maltehelmert).


Regarding 1: putting myself into the role of a newbie, the current description reads as if I would always want to use compact=False. The documentation lists a number of advantages for this option, and the only disadvantage is only implied by the text, and I don't think most people would guess correctly just how huge the amount of data to be copied with compact=False is. So I see a potential trap e.g. for our students here, although I guess it's not a big deal if they work from an existing example. Hence, at minimum, I would add a strong warning against compact=False that mentions some quantitative data.

You write "For preprocess experiments, the PDDL files are copied into the run dirs unconditionally", but the documentation says "If compact is True, reference benchmarks and preprocessed files instead of copying them." (Emphasis mine.) I think the other things you mention (code is copied etc.) should also be documented somewhere, but perhaps mainly in the "user manual"-level documentation we were discussing. The description for "compact" mentions some aspects that something is referenced vs. copied, but I don't think there's a description of what is done. (So we describe a delta here, but not the base to which the delta is applied.)

If we're not even sure if compact experiments currently work and if the behaviour doesn't match the documentation, this doesn't inspire confidence. :-) In this case, we should test it, and see if it still works. My recommendation would be to run the experiment from http://issues.fast-downward.org/msg2986 twice, once with compact=False and once with compact=True, see if the experiment still works, and measure the size of the resulting experiment directories after completing the experiment. We could then add this as indicative info about the amount of additional space usage people could expect for compact experiments.

BTW, your mentioning of "preprocess experiments" made me wonder how these are actually run. I couldn't find a description in the documentation regarding preprocessing vs. search vs. complete experiments. For example, http://lab.readthedocs.org/en/latest/downward.experiments.html has a note saying "You only have to run preprocessing experiments and fetch the results for each pair of translator and preprocessor revision once, since the results are cached. When you build a search experiment, the results are automatically taken from the cache." but I couldn't find information what "preprocessing experiments", "search experiments" and "fetching the results" is.

Short summary:

jendrikseipp commented 10 years ago

Original comment by Malte Helmert (Bitbucket: malte, GitHub: maltehelmert).


Regarding limits: I didn't actually recommend to deprecate it in the constructor. :-) I thought it might make sense to have defaults and overrides for this.

The point dearest to me here is that if this functionality exists in the constructor and in add_algorithm, then they should support the same options, and the way that the arguments are passed should match. (Not an issue if we want to deprecate the constructor version of this, and the more I think about it, the more it makes sense to me, since this could also be considered some kind of command-line parameter that should be kept together with the other parameters. But I'm still not 100% convinced. Maybe also consult the others on this?)

Regarding dicts vs. individual parameters, I have no strong preference. If there were no previous code, individual parameters look somewhat nicer, but I think the advantage is small, and changing an API always comes at a cost.

If we go for individual parameters, though, I have a strong preference for the naming convention limit_search_time etc. instead of search_time_limit, since I strongly prefer common prefixes over common suffixes for grouped parameters. (This is nice, for example, if they appear sorted in some kind of automated output.) I think the resource limits belong together more strongly than e.g. search_revision and search_time_limit belong together. For example, it's easier to document all the revision arguments together and all the limit arguments together than having the limits spread all over the list.

jendrikseipp commented 10 years ago

Agreed. Would you also prefer rev_translate and rev_search over translate_rev and search_rev? I find the latter easier to read.

jendrikseipp commented 10 years ago

Original comment by Malte Helmert (Bitbucket: malte, GitHub: maltehelmert).


I asked myself the same question and didn't come up with a strong preference for the former. In that case, API stability wins. (Doubly so if we want to eventually get rid of having three different revisions anyway.)

jendrikseipp commented 8 years ago

Whole-planner experiments make this isssue obsolete.

jendrikseipp commented 8 years ago

Original changes by Jendrik Seipp (Bitbucket: jendrikseipp, GitHub: jendrikseipp).


changed state from "new" to "resolved"