Closed alex4200 closed 4 years ago
@njbruce Please check https://github.com/antonelepfl/usecases/ for inserting new use cases in the Brain Simulation Platform
@njbruce ready to implement a notebook in the week of 15 April 2019. The Collaboratory Jupyter environment seems to have been set up properly.
Hello, @njbruce is there any news on use cases from 2 to 5? May we help in some way to finalize them or if they are to be finalized at a later time, close the ticket for the moment?
@lbologna: Neil has left the group and it is not clear when and who will work on these use cases. I would close the ticket until we settle this
@DKokh I will leave the ticket open, but move it to another milestone, otherwise it will be completly forgotten.
@DKokh Do you have any update regarding this usecase?
@alex4200 while working on the usecase "Compare a specific region of the electrostatic potentials surrounding a set of protein isoforms with multipipsa" i could not install an R library required for some plots in this analysis: gplots i tried the following: !R --vanilla -e 'install.packages("gplots",repos="https://stat.ethz.ch/CRAN/", lib="~/.R/lib", dependencies = TRUE)'
as well as several variants of the above. But i always get: ERROR: dependency ‘caTools’ is not available for package ‘gplots’
it looks like caTools is not available for the R version 3.3.3 installed on the system. Also older versions of gplots did not work. Is there a way to install the gplots library on the system level with apt-get? (apt-get install r-cran-gplots)
overall the following R libraries are required: gplots,cluster,graphics,stats,heatmap3,fpc
@StefanGIT I think it is unlikely that the support team will update the basic system for the Collaboratory, as they put into production a newer Collaboration tool (refered to as "Collaboratory 2.0") which also contains newer jupyterlab notebooks. My suggestion is to use this new infrastructure, maybe the libraries you need are already available on this platform. The main link to that platform is (you might need to request an account):
https://lab.humanbrainproject.eu/
If you need new/other packages on that infrastructure, I am pretty sure they will install it for you.
@alex4200 I have modified multipipsa (now version 4.0.5) to use a different plotting library for this type of analysis. The fixed notebook is now in the private MolecularUseCases Collab. https://collab.humanbrainproject.eu/#/collab/34594/nav/351519 How should i get this to the public one?Clone or copy? Please note when you test this, that the visuals (nglview) only show up if they are executed while the cell is visible in the browser. If one simply hits 'RunAll' they will not show up by default.
This notebook is one of the 'coming soon' in the Online UseCase section: https://collab.humanbrainproject.eu/#/collab/1655/nav/362934
The same of the above applies to this notebook: https://collab.humanbrainproject.eu/#/collab/34594/nav/386452
The following notebook: https://collab.humanbrainproject.eu/#/collab/34594/nav/387399 had a problem with the version of numpy that is used together with older versions r2py (the r2py version was fixed to a certain number for python2 compatibility). Now in python 3 this requirement is not there anymore. Therefore i had to again prepare an additional multipipsa version 4.0.6 and a later numpy version. To make the notebooks above using the same multipipsa i have modified them again and check again.
Now all 3 PIPSA notebooks in the MulecularLevel section: https://collab.humanbrainproject.eu/#/collab/1655/nav/362934 should be available and work in python 3 by the links given above.
@StefanGIT First of all, in order for me to see the notebooks you have developed, in order I can test them, the Collab the notebooks are in must be made public.
I then can see the notebooks and can copy the notebooks into one of my Collab in order to test them.
When the testing is finished your notebook(s) can be added to the DEV version of the Brain Simulation Platform. To do so they just need to be added to a json file referencing the location of the notebook in your original collab (no change required to copy/paste). When a user selects that notebook a copy is created automatically, no change is made.
If the notebook is ready to be put in production I will push the notebook to a github repository - again no change required from your side.
All this procedure should be described in this google doc under "Notebook Development". Please have a look and let me know if you can access this document, and if the description is good enough.
Yes, i can access the document and i will try to follow the instructions there, Thanks a lot.
I have now put the 3 multipipsa usecases into the dev area (MolecularUseCases_Pub). I attach images to show which workflow correspond to which image in the production area: https://collab.humanbrainproject.eu/#/collab/50197/nav/530508
I am lost, the entry is already there, but i am not sure what i should use as URL to json file, i thought the json file will be added to a git repository after some quality check. For example the notebook: https://collab.humanbrainproject.eu/#/collab/50197/nav/530508 is described in the entry below, the files section is empty, but i have no idea what i should enter there?
{
"title": "Compare the electrostatic potentials surrounding a set of protein isoforms with multipipsa",
"description": "Use the multipipsa package to compare the electrostatic potentials of nine isoforms of the enzyme adenylyl cyclase and cluster the isoforms by electrostatic similarity.",
"experience": ["all"],
"maturity": ["beta"],
"disabled": true,
"picture": {
"src": "https://raw.githubusercontent.com/antonelepfl/usecases/dev/src/assets/images/pipsa.png"
},
"next": "ta_form",
"dataprotected": true,
"files": []
},
It is explained here: https://github.com/antonelepfl/usecases/blob/dev/documentation/add_new_usecase.md#add-new-usecase-in-an-existing-domain
If not, please let us know.
{
"entryname": name to be added in the navigation item when the usecase is created/added ,
"appid": (number) possible values [175 (ipython notebboks), 6 (external html)],
"contenttype": (string) possible values ["x-ipynb+json", "text/html"],
"extension": (string) extension with "." like ".ipynb",
"file": UUID of the file in collab storage (more information see below) OR raw file URL,
"file_prod": (optional*) Github file url using API,
"initial": (boolean) if true this nav item will be shown when redirect to collab,
"justcopy": (boolean) if true, it will avoid creating a nav item,
}
and
To get the UUID in collab
Go to the collab storage where the file is located.
Click on the file and the URL bar will change from something like: https://collab.humanbrainproject.eu/#/collab/<collab_number>/nav/<nav_number>" to https://collab.humanbrainproject.eu/#/collab/<collab_number>/nav/<nav_number>?state=uuid%3D915417d1-359f-4eab-bcb1-a0881dea8d7d so now we have to take the last part after "state=uuid%3D" from the URL. Like: 915417d1-359f-4eab-bcb1-a0881dea8d7d
Please point out what is missing unclear etc.
Finally i found how to get the UUID, as i understand i can only add this and not the file since it is not yet in the repository. May be this could be more explicit.
@StefanGIT I have an import error in the second cell (use case "Compare the electrostatic potentials surrounding a set of protein isoforms with multipipsa"). The error reads
Importing python modules failed.
There is a problem with the python environment!
cannot import name '__html_manager_version__'
Which does not tell me where this error happened. I strongly suggest to remove this try/except cause:
BTW: Error is for import nglview
@alex4200 i have removed the exception catch on all three notebooks. But it is not clear why nglview cannot be loaded. It works when i run it? Could it be a dependency problem with other packages/versions in different environments?
I am retrying again, maybe my setup was mixed up.
This seems to work, but later I get an error when I am asked related to installing a package (I answered 'n', it is not clear to me what is going on in that cell). Error:
Installing required R-libraries into a user directory
In case you get asked answer 'y', also in case you get asked
to create the directory.
To avoid this, create a file called '.Renviron' in your home
directory and add 'R_LIBS_USER=~/.R/lib' in this file
/opt/conda/lib/python3.6/site-packages/multipipsa/data/pipsa/bin/npotsim -pg /opt/conda/lib/python3.6/site-packages/multipipsa/data/pipsa/bin/2potsim_skin_spheresU -fp /home/jovyan/work/wholePIPSA -fn names -pa spheres -lg sims.log -pr 3 -sk 4
Would you like to use a personal library instead? (y/n) n
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/multipipsa/clusterpipsa.py in importfromR(self, packname)
54 try:
---> 55 rpack = rpackages.importr(packname)
56 except Exception:
/opt/conda/lib/python3.6/site-packages/rpy2/robjects/packages.py in importr(name, lib_loc, robject_translations, signature_translation, suppress_messages, on_conflict, symbol_r2python, symbol_check_after, data)
452 _system_file(package = rname)):
--> 453 env = _get_namespace(rname)
454 version = _get_namespace_version(rname)[0]
RRuntimeError: Error in loadNamespace(name) : there is no package called ‘heatmap3’
During handling of the above exception, another exception occurred:
RRuntimeError Traceback (most recent call last)
<ipython-input-10-74b1f8b0c85f> in <module>()
7 pipsaCalc.runClusterPipsa(structures=structures,
8 points=[],
----> 9 cluster=cluster)
/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in runClusterPipsa(self, structures, cluster, points)
635 if points is None:
636 points = self.__pipsaStructure.getCAAtoms()
--> 637 self.runPipsa(structures, points=points)
638 # residueNumber = 1
639 # for p in self.__pipsaStructure.getCAAtoms():
/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in runPipsa(self, structures, points)
684 if self.__cluster is not None:
685 self.__cluster.clusterSingleRun(points=points, similarityType=SimilarityType.HODGKIN,
--> 686 pipsaLog="sims"+filesuffix+".log")
687 # cmdPipsa2R = ["perl", self.__pipsa_root + "/aux/pipsa2R.pl",
688 # "-s", "sims.log", "-t", "h", "-m", pictFile,
/opt/conda/lib/python3.6/site-packages/multipipsa/clusterpipsa.py in clusterSingleRun(self, points, similarityType, pipsaLog, clusterType)
129 graphics = self.importfromR('graphics')
130 stats = self.importfromR('stats')
--> 131 heatmap3 = self.importfromR('heatmap3')
132 # fpc = rpackages.importr('fpc')
133 for i in range(0, self.__number_of_points):
/opt/conda/lib/python3.6/site-packages/multipipsa/clusterpipsa.py in importfromR(self, packname)
57 utils = rpackages.importr("utils")
58 # , contriburl = R_REPO)
---> 59 utils.install_packages(packname, dependencies=True,repos="https://cloud.r-project.org")
60 rpack = rpackages.importr(packname)
61 return rpack
/opt/conda/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
176 v = kwargs.pop(k)
177 kwargs[r_k] = v
--> 178 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
179
180 pattern_link = re.compile(r'\\link\{(.+?)\}')
/opt/conda/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
104 for k, v in kwargs.items():
105 new_kwargs[k] = conversion.py2ri(v)
--> 106 res = super(Function, self).__call__(*new_args, **new_kwargs)
107 res = conversion.ri2ro(res)
108 return res
RRuntimeError: Error in (function (pkgs, lib, repos = getOption("repos"), contriburl = contrib.url(repos, :
unable to install packages
This installs a R package that is not installed before. At the top you get the information: "In case you get asked answer 'y', also in case you get asked to create the directory." Therefore one needs to say 'y' at this point.
Ah this is not a question in the notebook, this is a question from the install process itself?
In that case it is not user friendly and not intuitive!
Yes, the better option would be to install this package already in R on the OS level... This involves the following packages: grDevices, heatmap3,stats ,graphics,cluster I guess except for the heatmap3 they are part of the base R installation.
I try to make a commandline to install it directly in R, similar to the pip commandlines.
I have added the commands: ! wget https://cran.r-project.org/src/contrib/heatmap3_1.1.6.tar.gz ! R CMD INSTALL heatmap3_1.1.6.tar.gz at the top of the notebooks. This will install the required heatmap.3 package. I hope this are all packages that are not by default alreay in R. Still multipipsa gives out the message and checks if the package is there (see below) but normally there should be no further use internvention needed.
def importfromR(self, packname): try: rpack = rpackages.importr(packname) except Exception: utils = rpackages.importr("utils")
utils.install_packages(packname, dependencies=True,repos="https://cloud.r-project.org")
rpack = rpackages.importr(packname)
return rpack
The notebook previously converted to python3, that was already online: https://collab.humanbrainproject.eu/#/collab/50197/nav/350840 (UUID: 7b372de8-cc8d-4311-8f4e-1dc8a3470c92 ) was slightly modified as well, to use the same package versions as the new notebooks. This to avoid that packages are removed and reinstalled all the time.
@StefanGIT I get an error in the notebook "Identify potential protein binding sites of a set of protein isoforms":
running:/opt/conda/lib/python3.6/site-packages/multipipsa/data/pipsa/bin/2potsim_skin_spheresNN -g1 AC1.grd -g2 AC5.grd -p1 AC1.pdb -p2 AC5.pdb -pa spheres -pr 3.000 -sk 4.000
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-4015d9cfdec6> in <module>()
----> 1 pipsaCalc.runBindingScorePipsa(ingrp, outgrp)
/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in runBindingScorePipsa(self, ingroup, outgroup)
652 # allProteins = ingroup + outgroup
653 # self.runClusterPipsa(allProteins, cluster = cluster)
--> 654 self._runPipsaWithinGroup(ingroup)
655 self._runAcrossGroup(ingroup, outgroup)
656
/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in _runPipsaWithinGroup(self, ingroup)
602 """
603 for x in it.combinations(ingroup, 2):
--> 604 r = self._runPipsaComparison(x[0], x[1])
605 self.__pipsaResult.pushIngroupResult(x[0], x[1], r)
606
/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in _runPipsaComparison(self, pr1, pr2)
593 r[4, count] = float(oList[11]) # points in common
594 count = count+1
--> 595 return np.nan_to_num(r, copy=True)
596
597 def _runPipsaWithinGroup(self, ingroup):
TypeError: nan_to_num() got an unexpected keyword argument 'copy'
It might be related to numpy; the default version of numpy is 1.11.2. Can you please have a look?
Thanks for the hint with numpy, yes the copy function requires numpy version 1.16. I have added this as requirement and use pip install to install the required version. Actually, when i ran the notebook i got Requirement already satisfied: numpy in /opt/conda/lib/python3.6/site-packages (from multipipsa==4.0.10) (1.18.1) Is the production run in a different home environment? I have added the requirement to all three notebooks: https://collab.humanbrainproject.eu/#/collab/50197/nav/530508 https://collab.humanbrainproject.eu/#/collab/50197/nav/530509 https://collab.humanbrainproject.eu/#/collab/50197/nav/530510
@StefanGIT I get an error message
* installing to library ‘/usr/local/lib/R/site-library’
Error: ERROR: no permission to install to directory ‘/usr/local/lib/R/site-library’
Sorry, but it really looks like the installations on the test and production system are somewhat different. I have now installed the R library the following way in all three notebooks: ! mkdir -p ~/.R/lib ! grep -qxF 'R_LIBS_USER=~/.R/lib/' ~/.Renviron || echo 'R_LIBS_USER=~/.R/lib' >> ~/.Renviron ! wget -c https://cran.r-project.org/src/contrib/heatmap3_1.1.6.tar.gz ! R CMD INSTALL -l ~/.R/lib heatmap3_1.1.6.tar.gz
Hope this resolves the issue...
@StefanGIT For the last command I get an error now
ERROR: dependency ‘fastcluster’ is not available for package ‘heatmap3’
* removing ‘/home/jovyan/.R/lib/heatmap3’
Fastcluster will now also be installed by the R procedure at front in all three notebooks. Since it looks like i run these tests in a different environment, where i do not get these errors, i need to apologize for the incremental approach.
@StefanGIT Yes, each user has its own environment. So when you have installed something in the past, is is installed, but just for the user (you).
I suggest to write to support@humanbrainproject.eu as ask for a "container reset". Then your user container will be reset to the standard setup. You need to mention your User ID (In the Collab: Click on your profile image on the top right, then go to profile, check your User ID). After that you will see all the issue there still might be.
Also there is an --upgrade missing for the numpy upgrade:
! pip install --upgrade numpy>=1.16
And for the notebooks as they are I get an error in the second cell:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-2-bfe6bccef2a2> in <module>()
7 import rpy2
8 import os, wget, datetime, magic, inspect
----> 9 from multipipsa.multipipsa import PipsaRun, ApbsRun
10 from multipipsa.clusterpipsa import ClusterPipsa
11 from multipipsa.pipsatypes import DistanceType
/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in <module>()
21 import warnings
22 from pprint import pformat
---> 23 from multipipsa.clusterpipsa import ClusterPipsa
24 from multipipsa.pipsatypes import ScoreType
25 from multipipsa.pipsatypes import SimilarityType
/opt/conda/lib/python3.6/site-packages/multipipsa/clusterpipsa.py in <module>()
11 import rpy2.robjects.packages as rpackages
12 #import rpy2.robjects.numpy2ri
---> 13 from rpy2.robjects import pandas2ri
14
15 from rpy2.robjects.packages import importr
/opt/conda/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py in <module>()
10 INTSXP)
11
---> 12 from pandas.core.frame import DataFrame as PandasDataFrame
13 from pandas.core.series import Series as PandasSeries
14 from pandas.core.index import Index as PandasIndex
/opt/conda/lib/python3.6/site-packages/pandas/__init__.py in <module>()
54
55 # define the testing framework
---> 56 import pandas.util.testing
57 from pandas.util.nosetester import NoseTester
58 test = NoseTester().test
/opt/conda/lib/python3.6/site-packages/pandas/util/testing.py in <module>()
20
21 from numpy.random import randn, rand
---> 22 from numpy.testing.decorators import slow # noqa
23 import numpy as np
24
ModuleNotFoundError: No module named 'numpy.testing.decorators'
Maybe numpy 1.18.1 is too new? numpy 1.16.1 also does not work, I get even a different error
Please ask for your container to be reset, then fix the notebook, test again, and then let me know.
@alex4200 I have modified the multipipsa notebooks to use a specific numpy and pandas version. I have checked them on a clean container. Also I have added a note at the top, that running all cells at once might not show the graphics from nglview.
Could you check them again in production environment?
@StefanGIT Retest of the notebooks worked fine, I greenlit the merge request to the next stage (will be next week).
This should be merged. @alex4200 Please confirm that.
This UC will be maintained on SGA3
Already in Prod.
New Use Case
Use case 1: Calculating the electrostatic potential of a protein from its atomic structure
Use the multipipsa package to assign atomic charges and radii to a protein structure then solve the Poisson-Boltzmann equation to calculate its electrostatic potential in aqueous solution.
Mandatory features
Important features
To do
Use case 2: PIPSA analysis to compare the electrostatic potentials surrounding a set of protein isoforms
Use the multipipsa package to compare the electrostatic potentials of nine isoforms of the enzyme adenylyl cyclase and cluster the isoforms by electrostatic similarity.
Mandatory features
Nice to have features
To do
Use case 3: PIPSA analysis to compare a specific region of the electrostatic potentials surrounding a set of protein isoforms
Use the multipipsa package to compare the electrostatic potentials of nine isoforms of the enzyme adenylyl cyclase, at a specific site on their surfaces, and cluster the isoforms by electrostatic similarity in this region.
Features
To do
Use case 4: Identification of potential protein binding sites by comparing the electrostatic potentials of a set of protein isoforms
Use the multipipsa package to compare the electrostatic potentials of nine isoforms of the enzyme adenylyl cyclase at many sites on their surfaces. The electrostatic similarities at these sites is compared to known isoform-specific regulation patterns for the inhibitory protein , to predict the likely binding sites of regulatory proteins.
Mandatory features
Important features
To do
Use case 5: Set up a Brownian dynamics simulation for calculating protein-protein association rate constants
Starting from a structure of the complex formed between adenylyl cyclase 5 and Gαolf obtained from a molecular dynamics simulation, set up all the files required to perform a Brownian dynamics simulation with SDA.
Mandatory features
To do
Use case 6: Analyse the results of a Brownian dynamics simulation for calculating protein-protein association rate constants
Analyse the results of an SDA Brownian dynamics simulation to calculate the rate constant of Gαolf associating to adenylyl cyclase 5, and estimate the error in this prediction via bootstrapping.
Mandatory features
To do
What does your use case do? What activities can users perform while using it? What makes it different from similar use cases? List the main functions that you will build into your product here. Also specify the priority ‘mandatory’, ‘important, ‘nice to have’.
Acceptance Criteria
Define here the acceptance tests to evaluate the use case’s compliance with the requirements as defined above. Also possible end users for testing can be included here.