ProtProtocols / IsoProt

Protocol of the analysis of iTRAQ/TMT proteomics data including quantification, statistical analysis and maybe clustering
https://protprotocols.github.io
11 stars 4 forks source link

Separate search and GUI functionality #28

Closed jgriss closed 6 years ago

jgriss commented 6 years ago

Move the code to display the GUI and to perform the search in two separate python files.

Define and document a data structure to store all user settings from the GUI.

This data structure must then be accessible to all subsequent cells within the Notebook.

gvinterhalter commented 6 years ago

Hi @jgriss ! I'll try and finish this today

gvinterhalter commented 6 years ago

I created a ui_separate branch because the R code in the cell bellow is crashing and I don't know if it's the new code that's causing this or not. When everything is done and working we can get rid of it. The old code is kinda not working for me ether.

I have some thoughts and questions:

  1. Is the Rinput the only global variable required for R cells to work? 1.1 I wold like to not use any global variable and insted keep the data in SearchUI instance (ui), but for some reason using %%R -i ui.Rinput ... or %%R -i ui.getRinput() ... doesn't work...
  2. Are global variables required for something else?
  3. All the settings can be accessed trough SearchUI instance (ui). It will get more accessible as I continue to refactor the SearchUI and ExpDesignUI.

@jgriss, do you know why the first R cell wold crash at:

[1] "Filtered 0 PSMs @ 0.01 FDR"
  Error in withVisible({ : Error: No valid PSMs found)]]]
veitveit commented 6 years ago

I didn't get so far but the reason should be that the default labeling method for the example is wrong. It should be iTRAQ8 (either one of both) and not TMT10.

  1. The R script reads the Rinput variable and also the file with the experimental design. The R magic does not seem able to access subclasses of variables (e.g. Rinput or ui). A way around would be using a parameter file or just copying the required values from ui to Rinput
  2. Otherwise global variables should not be necessary

I also saw a few errors: a) The file search_tool.ipy should be in a folder (what about bin?). b) There is an error when running the "Enter design" button but you might have corrected it already.

gvinterhalter commented 6 years ago

You are right it was TMT10. Now that part works but it wold seem I'm not generating exp_design.tsv. I'll fix it later today. But you might want to change the R code in a way that if it fails it reverts the chdir it did?

a) I believe the Scripts folder shod be copied as a whole but I haven't changed that? b) I thinkso.

veitveit commented 6 years ago

Great! I am quite busy this week but should be able to continue working on the R scripts next week.

jgriss commented 6 years ago

Hi @gvinterhalter !

Some of the issues are caused by the extension: the %run magic treats .py and .ipy files differently. The first are executed in their own namespace while .ipy files are executed in the global namespace as if the code was directly present in the cell.

I'll try to test everything within the next few hours and will then commit updated versions to the new branch. Does that sound OK?

gvinterhalter commented 6 years ago

Hi @jgriss.

Sure :)

jgriss commented 6 years ago

Hi @gvinterhalter ,

My original idea was a little different for how to separate the code: From a python development point of view, I like your current methods a lot. Only, I fear that they will not work well with Jupyter notebooks.

For me, the main idea is to have standard "modules" that people can quickly integrate into new notebooks to create custom workflows. That was the main idea to move the search + gui functionality into separate files.

To enable people to use these modules as "natively" as possible in other cells etc. we need to have quite a lot of global variables - only these are accessible in the subsequent cells and only, if the python code is in .ipy files.

Therefore, I'd suggest the following structure:

If we now adapt the search functionality and the GUI functionality towards this concept, the following code would be possible:

---- python cell -----
%run search_gui.ipy

# now I have "gui_settings" or similar in my global python namespace

# ----- python cell -------
# Here I simply launch the search by converting the "gui_settings" into the "search_settings" object
search_settings = gui_settings
# adapt something
if search_settings["fragment_tolerance"] > 1:
  search_settings["fragment_tolerance"] = 0.1

# run the search
%run search.ipy

----- R cell ---- < since I know the name of the previous output, I can import the python objects
%%R -i gui_settings,search_result

# I do my R magic

Additionally, if I only want to process the data for my very special paper I can do:

------ Python cell -----
search_settings = my_very_secret_settings

# directly run the search without the GUI
%run search.ipy
....

I'll now try to adapt the two modules so we have some kind of basis for the discussion.

Additionally, if you agree with this structure, we should have default names for input and output objects. I'd suggest [module name]_in and [module name]_out

jgriss commented 6 years ago

Hi @veitveit and @gvinterhalter ,

I've now updated the two python scripts to become the "modules" I envisioned. Additionally, I've added another module to display buttons that execute the next cells.

I also adapted the Example.ipynb to use these new modules.

The last thing to do is to update the R code accordingly. I'll try to get his done today or tomorrow.

All changes are still in the ui_separation branch.

Cheers, Johannes

gvinterhalter commented 6 years ago

Hi @jgriss ,

I wold suggests that the structure you propose be not just a dict, but a class object inheriting an OrderdDict. Idea wold be to create a base class that can provide:

  1. Ordering of keys is preserved when printing or saving to json.
  2. __getattr__ and __setattr_ can be overloaded to make it more easier to accesses the content with dot notation
  3. __dir__ can be overloaded to return the keys of the OrderdDict which ipython uses for autocomplete.

I can't say I totally understand the need for global stuff as you describe it.

  1. The UI will be an object that can have get_parameters() -> dict function. Or the parameters can be stored in the UI object in some way. The main thing is that UI object produces parameters
  2. Here is the search. There is no magic global stuff that needs to be baked in. param_dict = ui.get_parameters() param_dict["..."] = secret sauce recipe some_result = search(**param_dict) # call the search and retrieve some_result

I wold like to do some work on the UI but I'm not sure how much free time I will have in the next period.

jgriss commented 6 years ago

Hi @gvinterhalter and @veitveit ,

I've now updated the R code Example.ipnyb until protein inference is done. For me, everything was working until then (with still many TODOs in the code...).

@gvinterhalter : Did you already have a look at the code changes I made? The search and search_gui file are quite ready for use.

About whether to use the dict object or not: As you will see in the notebook I had to transfer everything to a JSON string to get the variables into R. Currently, I believe that 99% of our analysis code will be in R - at least that was @veitveit and my plan. In our original plan, python was only used to get the search working but was never intended as the primary platform. Therefore, I cannot see the advantage of using a more complex datatype than a dict. But I am open for any discussion!!!

The "global stuff" again comes from the idea of merging the different languages in the Jupyter notebook environment. As you know, the Notebook starts one kernel per programming environment. Some languages can pass global variables directly between each other. That's why I wanted to keep the configuration files global. They can then be easily manipulated in any cell without the need to call some member functions.

From a programming design point of view it's not very elegant but given that we target a rather unexperienced user group with these notebooks, I feel that this is the simplest solution (people can f.e. simply add a cell to state

search_in["fragment_tolerance"] = 0.01

That was the main idea.

From my point of view, the current state is quite stable and I believe that we all basically agreed on this method to organise our python code (although the details are not settled).

Therefore, I'd suggest to merge this trunk with main again. Do you agree?

veitveit commented 6 years ago

Yes, please merge. We still need to spend quite some time on testing the protocol on different data sets and so better keep what we have now.

Separating the different modules was very helpful and we should incorporate them at least partly into the original template. And modifications of them can be easily adjusted, right?

gvinterhalter commented 6 years ago

Hi @jgriss,

No reason not to merge. I looked at the changes and it all looks good. Sorry for not commenting sooner, I'm in a pinch.

jgriss commented 6 years ago

Done. I've merged the ui_separation branch and deleted it.

jgriss commented 6 years ago

Since the first step to separate the GUI from the search functionality is complete I'm now closing this issue. To keep everything a bit more organized I'd suggest that we create new issues for other improvements (ie. optimised design of the python modules).