doc: Notebook and documentation improvements

ggazzola commented 5 years ago

This is the main TODO list, and refers to the steps in the 03_SimpleTutorial notebook. Details and discussion on the items in the list are in subsequent comments.

General

[x] Review minor documentation edits here.
[x] Omit all print commands that are currently used for "debugging" purposes (e.g., in Steps 1,2,3, etc.) and replace them with a validation/test within an existing function (e.g., have daptics.connect() check if daptics.gql.__dict__ contains the gql attribute and throw an error if it doesn't).
[x] Hide (e.g., within existing functions) all variables that the user should not change.
[x] Merge steps {5,6}, {9,10}, and {13,10} <--> E.g., hide the daptics.poll_for_current_task() call in a while loop within daptics.save_experimental_and_space_parameters_csv() / daptics.save_experimental_and_space_parameters_csv() / daptics.save_experiment_responses_csv() and break the loop when status == success.
[ ] Document how to execute a code block ("getting started")
[ ] Document how to re-start from scratch ("getting started")
[ ] Document how to implement the [generate design - upload responses] loop (for users who would benefit from full automation, e.g., users that perform simulated experiments)
[ ] Document that there is a way to go from design to design + responses = experiments for then next design without using CSV files, using in-memory Python structures only.
[ ] Document how to retrieve the design/responses of generation n.
[ ] Keep track of the last completed task/step, so that the function(s) called at step X can return an error message like "You can run this step only after running Step Y" if the user tries to run step X at the wrong time.
[ ] Document how to recover from a failed step
[postpone] Add input/file validation (e.g., return informative errors if file format is wrong).
[not natural] Change step index from "Step 1", "Step 2", ... to "Step A", "Step B", ... (to avoid confusion between step index and the index within brackets in In [ ] on the left side of each code cell).

Step 3

[x] If you create a session with the same name of a previously API-generated session you get an error (OK), but the error message is not understandable.
[x] Will the user ever want to set is_Demo=True? If so, explain when/why in the documentation. If not, hide the is_Demo=False statement (e.g., within daptics.create_session())
[x] This piece of documentation may be a bit unclear: "To avoid parsing bugs, add blank padding columns for rows that have fewer parameter values, so that the CSV file has the same number of columns on each row." Maybe add an example?
[x] Documentation: A bit unclear what is meant by "home page" of the "Jupyter server".
[x] Documentation: factorial_space.csv is inconsistent with fname = 'esd-mixture-5.csv'.

Step 7

[x] I don't quite understand this step: why are we exporting to/reimporting from an extra validated_space.csv file? Can't we simply print the space file that was specified in Step 5 (or skip Step 7 altogether)?

Step 8

[x] Documentation: clarify that we present three different options/ways of running step 8, and that the user should choose one of them.
[x] BUG: The first of the three Step-8 options generates an error (NameError: name 'colHeaders' is not defined)
[x] Documentation (alternate step): The parameter names in the file are V11, V12, V13, not param1,param2,param3,param4. Maybe change names in the file to param1,param2,param3 and remove param4 in the documentation?

Step 9

[x] Documentation: "Submit the initial experiments file downloaded" --> "Submit the initial experiments file created" ?

Step 11

[x] gen and fname should be calculated automatically, and not be left for the user to define (see 'Hide ...' in General); remove the 'We always need to specify the design generation number to retrieve a design" bit in the documentation.
[ ] Why not saving the N-th design directly to genN_experiments.csv, rather than saving it to genN_design.csv and having the user manually duplicate and rename that file in the next step?

Step 12

[x] Documentation: clarify that we present two different ways of running step 12, and that the user should choose one of them.
[ ] Documentation: explain what the user should to if he does have extra experiments (and explain what extra experiments are)

Step 13

[ ] Often freezes with errors like:

status = running -- 37 seconds.
Error:  HTTPConnectionPool(host='inertia.protolife.com', port=8080): Read timed out. (read timeout=None)

nhpackard commented 5 years ago

Merge steps {5,6}, {9,10}, and {13,10} <--> E.g., hide the daptics.poll_for_current_task() call in a while loop within daptics.save_experimental_and_space_parameters_csv() / daptics.save_experimental_and_space_parameters_csv() / daptics.save_experiment_responses_csv() and break the loop when status == success.

No: The whole point of launching long-running processes and then having the ability to poll is to not tie up a session waiting for a call to finish. User could launch the process with daptics.save_experimental_and_space_parameters_csv(), shut down the window, then come back later and see if it has finished. If user wants to write a loop to poll, fine.

nhpackard commented 5 years ago

Add to the documentation:

[ ] how to execute a code block;

Ummm... is this basic jupyter notebook knowledge? (execute code block by putting code in a colde cell and execute the cell with shift-return)

[ ] how to recover from a failed step;

should be no failures in the tutorial.

[x] how to re-start from scratch;
[x] how to implement the [generate design - upload responses] loop (for users who would benefit from full automation, e.g., users that perform simulated experiments)
[ ] how to retrieve the design/responses of generation n.

ggazzola commented 5 years ago

No: The whole point of launching long-running processes and then having the ability to poll is to not tie up a session waiting for a call to finish. User could launch the process with daptics.save_experimental_and_space_parameters_csv(), shut down the window, then come back later and see if it has finished. If user wants to write a loop to poll, fine.

I see. Then I think it would be useful for functions like daptics.save_experimental_and_space_parameters_csv to trigger some kind of equivalent of the progress bubbles we have in the web interface. E.g., have these functions print something like "Computing..." as soon as they are called, and then something like "Done" as soon as their task is completed (I am guessing the latter is trickier to do if the user closes the notebook/disconnects from the Jupiter server and reopen/reconnects to it at a later time).

I also added this item to the issue description on top of this page:

[ ] Keep track of the last completed task/step, so that the function(s) called at step X can return an error message like "You can run this step only after running Step Y" if the user tries to run step X at the wrong time.

ggazzola commented 5 years ago

Add to the documentation:

[ ] how to execute a code block;

Ummm... is this basic jupyter notebook knowledge? (execute code block by putting code in a colde cell and execute the cell with shift-return)

Yes, I think it may be worth mentioning the shift-return command somewhere in the intro, so the inexperienced user doesn't have to search that information somewhere else.

nhpackard commented 5 years ago

Why not saving the N-th design directly to genN_experiments.csv, rather than saving it to genN_design.csv and having the user manually duplicate and rename that file in the next step?

I think this is basically a good idea, but I haven't done it yet.

nhpackard commented 5 years ago

No: The whole point of launching long-running processes and then having the ability to poll is to not tie up a session waiting for a call to finish. User could launch the process with daptics.save_experimental_and_space_parameters_csv(), shut down the window, then come back later and see if it has finished. If user wants to write a loop to poll, fine.

I see. Then I think it would be useful for functions like daptics.save_experimental_and_space_parameters_csv to trigger some kind of equivalent of the progress bubbles we have in the web interface. E.g., have these functions print something like "Computing..." as soon as they are called, and then something like "Done" as soon as their task is completed (I am guessing the latter is trickier to do if the user closes the notebook/disconnects from the Jupiter server and reopen/reconnects to it at a later time).

Still no. The point is to launch the process, then let the user do whatever they want. If they want to create bubbles they can wrap the polling function inside a loop with bubbles. But they might want to go and do something with their liquid handling robot then come back and check to see if the process is done. 'Done' is when you call the polling function and see status=success instead of status=running.

But: FYI: I did make a class method that has more or less the behavior you suggest: wait_for_current_task(). It sits in a loop, printing out elapsed number of seconds, until the loop finishes, then returns.

The tutorial shows how to use the lower level polling function, and then uses the wait_for_current_task() for the rest of the tutorial.

pzingg commented 4 years ago

Adding an item to the list above from the server repo: Explain in one or more of the notebooks that there is a way to go from design to design + responses = experiments tfor next design without using CSV files, using in-memory Python structures only.

ProtoLife / daptics-api

doc: Notebook and documentation improvements #9