Closed jgriss closed 6 years ago
Hi @jgriss ! I'll try and finish this today
I created a ui_separate
branch because the R code in the cell bellow is crashing and I don't know if it's the new code that's causing this or not. When everything is done and working we can get rid of it. The old code is kinda not working for me ether.
I have some thoughts and questions:
Rinput
the only global variable required for R cells to work?
1.1 I wold like to not use any global variable and insted keep the data in SearchUI instance (ui), but for some reason using %%R -i ui.Rinput ...
or %%R -i ui.getRinput() ...
doesn't work...@jgriss, do you know why the first R cell wold crash at:
[1] "Filtered 0 PSMs @ 0.01 FDR"
Error in withVisible({ : Error: No valid PSMs found)]]]
I didn't get so far but the reason should be that the default labeling method for the example is wrong. It should be iTRAQ8 (either one of both) and not TMT10.
Rinput
or ui)
. A way around would be using a parameter file or just copying the required values from ui
to RinputI also saw a few errors: a) The file search_tool.ipy should be in a folder (what about bin?). b) There is an error when running the "Enter design" button but you might have corrected it already.
You are right it was TMT10. Now that part works but it wold seem I'm not generating exp_design.tsv. I'll fix it later today. But you might want to change the R code in a way that if it fails it reverts the chdir it did?
a) I believe the Scripts folder shod be copied as a whole but I haven't changed that? b) I thinkso.
Great! I am quite busy this week but should be able to continue working on the R scripts next week.
Hi @gvinterhalter !
Some of the issues are caused by the extension: the %run
magic treats .py
and .ipy
files differently. The first are executed in their own namespace while .ipy
files are executed in the global namespace as if the code was directly present in the cell.
I'll try to test everything within the next few hours and will then commit updated versions to the new branch. Does that sound OK?
Hi @jgriss.
Sure :)
Hi @gvinterhalter ,
My original idea was a little different for how to separate the code: From a python development point of view, I like your current methods a lot. Only, I fear that they will not work well with Jupyter notebooks.
For me, the main idea is to have standard "modules" that people can quickly integrate into new notebooks to create custom workflows. That was the main idea to move the search + gui functionality into separate files.
To enable people to use these modules as "natively" as possible in other cells etc. we need to have quite a lot of global variables - only these are accessible in the subsequent cells and only, if the python code is in .ipy
files.
Therefore, I'd suggest the following structure:
.ipy
module should have a defined "input" (optional) and "output" structuredict
in python and named lists in RIf we now adapt the search functionality and the GUI functionality towards this concept, the following code would be possible:
---- python cell -----
%run search_gui.ipy
# now I have "gui_settings" or similar in my global python namespace
# ----- python cell -------
# Here I simply launch the search by converting the "gui_settings" into the "search_settings" object
search_settings = gui_settings
# adapt something
if search_settings["fragment_tolerance"] > 1:
search_settings["fragment_tolerance"] = 0.1
# run the search
%run search.ipy
----- R cell ---- < since I know the name of the previous output, I can import the python objects
%%R -i gui_settings,search_result
# I do my R magic
Additionally, if I only want to process the data for my very special paper I can do:
------ Python cell -----
search_settings = my_very_secret_settings
# directly run the search without the GUI
%run search.ipy
....
I'll now try to adapt the two modules so we have some kind of basis for the discussion.
Additionally, if you agree with this structure, we should have default names for input and output objects. I'd suggest [module name]_in and [module name]_out
Hi @veitveit and @gvinterhalter ,
I've now updated the two python scripts to become the "modules" I envisioned. Additionally, I've added another module to display buttons that execute the next cells.
I also adapted the Example.ipynb to use these new modules.
The last thing to do is to update the R code accordingly. I'll try to get his done today or tomorrow.
All changes are still in the ui_separation branch.
Cheers, Johannes
Hi @jgriss ,
I wold suggests that the structure you propose be not just a dict, but a class object inheriting an OrderdDict
. Idea wold be to create a base class that can provide:
__getattr__
and __setattr_
can be overloaded to make it more easier to accesses the content with dot notation__dir__
can be overloaded to return the keys of the OrderdDict
which ipython uses for autocomplete.I can't say I totally understand the need for global stuff as you describe it.
I wold like to do some work on the UI but I'm not sure how much free time I will have in the next period.
Hi @gvinterhalter and @veitveit ,
I've now updated the R code Example.ipnyb
until protein inference is done. For me, everything was working until then (with still many TODOs in the code...).
@gvinterhalter : Did you already have a look at the code changes I made? The search and search_gui file are quite ready for use.
About whether to use the dict object or not: As you will see in the notebook I had to transfer everything to a JSON string to get the variables into R. Currently, I believe that 99% of our analysis code will be in R - at least that was @veitveit and my plan. In our original plan, python was only used to get the search working but was never intended as the primary platform. Therefore, I cannot see the advantage of using a more complex datatype than a dict. But I am open for any discussion!!!
The "global stuff" again comes from the idea of merging the different languages in the Jupyter notebook environment. As you know, the Notebook starts one kernel per programming environment. Some languages can pass global variables directly between each other. That's why I wanted to keep the configuration files global. They can then be easily manipulated in any cell without the need to call some member functions.
From a programming design point of view it's not very elegant but given that we target a rather unexperienced user group with these notebooks, I feel that this is the simplest solution (people can f.e. simply add a cell to state
search_in["fragment_tolerance"] = 0.01
That was the main idea.
From my point of view, the current state is quite stable and I believe that we all basically agreed on this method to organise our python code (although the details are not settled).
Therefore, I'd suggest to merge this trunk with main
again. Do you agree?
Yes, please merge. We still need to spend quite some time on testing the protocol on different data sets and so better keep what we have now.
Separating the different modules was very helpful and we should incorporate them at least partly into the original template. And modifications of them can be easily adjusted, right?
Hi @jgriss,
No reason not to merge. I looked at the changes and it all looks good. Sorry for not commenting sooner, I'm in a pinch.
Done. I've merged the ui_separation
branch and deleted it.
Since the first step to separate the GUI from the search functionality is complete I'm now closing this issue. To keep everything a bit more organized I'd suggest that we create new issues for other improvements (ie. optimised design of the python modules).
Move the code to display the GUI and to perform the search in two separate python files.
Define and document a data structure to store all user settings from the GUI.
This data structure must then be accessible to all subsequent cells within the Notebook.