cnr-ibf-pa / hbp-bsp-issues

Ticketing system for developers/testers and power users of the Brain Simulation Platform of the Human Brain Project
4 stars 0 forks source link

New Protein structure use cases #407

Closed alex4200 closed 4 years ago

alex4200 commented 5 years ago

New Use Case

Aspect Detail
Summary Use cases to simulate, analyze and visualize protein structures
Usecase Group Molecular Level
Expert Neil Bruce
Scientific User
Deadline first use case M12
Target audience New users of molecular modelling/simulation tools
Target interface Notebooks (Tutorials and real research)
HPC Requirements none
Dependencies nglview, R, apbs, pdb2pqr, AmberTools, SDA
Nominal runtime couple of minutes

Use case 1: Calculating the electrostatic potential of a protein from its atomic structure

Use the multipipsa package to assign atomic charges and radii to a protein structure then solve the Poisson-Boltzmann equation to calculate its electrostatic potential in aqueous solution.

Mandatory features

To do

Use case 2: PIPSA analysis to compare the electrostatic potentials surrounding a set of protein isoforms

Use the multipipsa package to compare the electrostatic potentials of nine isoforms of the enzyme adenylyl cyclase and cluster the isoforms by electrostatic similarity.

Mandatory features

To do

Use case 3: PIPSA analysis to compare a specific region of the electrostatic potentials surrounding a set of protein isoforms

Use the multipipsa package to compare the electrostatic potentials of nine isoforms of the enzyme adenylyl cyclase, at a specific site on their surfaces, and cluster the isoforms by electrostatic similarity in this region.

Features

To do

Use case 4: Identification of potential protein binding sites by comparing the electrostatic potentials of a set of protein isoforms

Use the multipipsa package to compare the electrostatic potentials of nine isoforms of the enzyme adenylyl cyclase at many sites on their surfaces. The electrostatic similarities at these sites is compared to known isoform-specific regulation patterns for the inhibitory protein , to predict the likely binding sites of regulatory proteins.

Mandatory features

To do

Use case 5: Set up a Brownian dynamics simulation for calculating protein-protein association rate constants

Starting from a structure of the complex formed between adenylyl cyclase 5 and Gαolf obtained from a molecular dynamics simulation, set up all the files required to perform a Brownian dynamics simulation with SDA.

Mandatory features

To do

Use case 6: Analyse the results of a Brownian dynamics simulation for calculating protein-protein association rate constants

Analyse the results of an SDA Brownian dynamics simulation to calculate the rate constant of Gαolf associating to adenylyl cyclase 5, and estimate the error in this prediction via bootstrapping.

Mandatory features

What does your use case do? What activities can users perform while using it? What makes it different from similar use cases? List the main functions that you will build into your product here. Also specify the priority ‘mandatory’, ‘important, ‘nice to have’.

Acceptance Criteria

Define here the acceptance tests to evaluate the use case’s compliance with the requirements as defined above. Also possible end users for testing can be included here.

alex4200 commented 5 years ago

@njbruce Please check https://github.com/antonelepfl/usecases/ for inserting new use cases in the Brain Simulation Platform

alex4200 commented 5 years ago

@njbruce ready to implement a notebook in the week of 15 April 2019. The Collaboratory Jupyter environment seems to have been set up properly.

lbologna commented 5 years ago

Hello, @njbruce is there any news on use cases from 2 to 5? May we help in some way to finalize them or if they are to be finalized at a later time, close the ticket for the moment?

DKokh commented 5 years ago

@lbologna: Neil has left the group and it is not clear when and who will work on these use cases. I would close the ticket until we settle this

alex4200 commented 5 years ago

@DKokh I will leave the ticket open, but move it to another milestone, otherwise it will be completly forgotten.

alex4200 commented 4 years ago

@DKokh Do you have any update regarding this usecase?

StefanGIT commented 4 years ago

@alex4200 while working on the usecase "Compare a specific region of the electrostatic potentials surrounding a set of protein isoforms with multipipsa" i could not install an R library required for some plots in this analysis: gplots i tried the following: !R --vanilla -e 'install.packages("gplots",repos="https://stat.ethz.ch/CRAN/", lib="~/.R/lib", dependencies = TRUE)'

as well as several variants of the above. But i always get: ERROR: dependency ‘caTools’ is not available for package ‘gplots’

it looks like caTools is not available for the R version 3.3.3 installed on the system. Also older versions of gplots did not work. Is there a way to install the gplots library on the system level with apt-get? (apt-get install r-cran-gplots)

overall the following R libraries are required: gplots,cluster,graphics,stats,heatmap3,fpc

alex4200 commented 4 years ago

@StefanGIT I think it is unlikely that the support team will update the basic system for the Collaboratory, as they put into production a newer Collaboration tool (refered to as "Collaboratory 2.0") which also contains newer jupyterlab notebooks. My suggestion is to use this new infrastructure, maybe the libraries you need are already available on this platform. The main link to that platform is (you might need to request an account):

https://lab.humanbrainproject.eu/

If you need new/other packages on that infrastructure, I am pretty sure they will install it for you.

StefanGIT commented 4 years ago

@alex4200 I have modified multipipsa (now version 4.0.5) to use a different plotting library for this type of analysis. The fixed notebook is now in the private MolecularUseCases Collab. https://collab.humanbrainproject.eu/#/collab/34594/nav/351519 How should i get this to the public one?Clone or copy? Please note when you test this, that the visuals (nglview) only show up if they are executed while the cell is visible in the browser. If one simply hits 'RunAll' they will not show up by default.

This notebook is one of the 'coming soon' in the Online UseCase section: https://collab.humanbrainproject.eu/#/collab/1655/nav/362934

StefanGIT commented 4 years ago

The same of the above applies to this notebook: https://collab.humanbrainproject.eu/#/collab/34594/nav/386452

StefanGIT commented 4 years ago

The following notebook: https://collab.humanbrainproject.eu/#/collab/34594/nav/387399 had a problem with the version of numpy that is used together with older versions r2py (the r2py version was fixed to a certain number for python2 compatibility). Now in python 3 this requirement is not there anymore. Therefore i had to again prepare an additional multipipsa version 4.0.6 and a later numpy version. To make the notebooks above using the same multipipsa i have modified them again and check again.

Now all 3 PIPSA notebooks in the MulecularLevel section: https://collab.humanbrainproject.eu/#/collab/1655/nav/362934 should be available and work in python 3 by the links given above.

alex4200 commented 4 years ago

@StefanGIT First of all, in order for me to see the notebooks you have developed, in order I can test them, the Collab the notebooks are in must be made public.

I then can see the notebooks and can copy the notebooks into one of my Collab in order to test them.

When the testing is finished your notebook(s) can be added to the DEV version of the Brain Simulation Platform. To do so they just need to be added to a json file referencing the location of the notebook in your original collab (no change required to copy/paste). When a user selects that notebook a copy is created automatically, no change is made.

If the notebook is ready to be put in production I will push the notebook to a github repository - again no change required from your side.

All this procedure should be described in this google doc under "Notebook Development". Please have a look and let me know if you can access this document, and if the description is good enough.

StefanGIT commented 4 years ago

Yes, i can access the document and i will try to follow the instructions there, Thanks a lot.

StefanGIT commented 4 years ago

I have now put the 3 multipipsa usecases into the dev area (MolecularUseCases_Pub). I attach images to show which workflow correspond to which image in the production area: https://collab.humanbrainproject.eu/#/collab/50197/nav/530508 image

StefanGIT commented 4 years ago

https://collab.humanbrainproject.eu/#/collab/50197/nav/530509

image

StefanGIT commented 4 years ago

https://collab.humanbrainproject.eu/#/collab/50197/nav/530510

image

alex4200 commented 4 years ago

@StefanGIT Please add corresponding entries to the usecase.json file HERE, see also description HERE.

StefanGIT commented 4 years ago

I am lost, the entry is already there, but i am not sure what i should use as URL to json file, i thought the json file will be added to a git repository after some quality check. For example the notebook: https://collab.humanbrainproject.eu/#/collab/50197/nav/530508 is described in the entry below, the files section is empty, but i have no idea what i should enter there?

  {
    "title": "Compare the electrostatic potentials surrounding a set of protein isoforms with multipipsa",
    "description": "Use the multipipsa package to compare the electrostatic potentials of nine isoforms of the enzyme adenylyl cyclase and cluster the isoforms by electrostatic similarity.",
    "experience": ["all"],
    "maturity": ["beta"],
    "disabled": true,
    "picture": {
      "src": "https://raw.githubusercontent.com/antonelepfl/usecases/dev/src/assets/images/pipsa.png"
    },
    "next": "ta_form",
    "dataprotected": true,
    "files": []
  },
alex4200 commented 4 years ago

It is explained here: https://github.com/antonelepfl/usecases/blob/dev/documentation/add_new_usecase.md#add-new-usecase-in-an-existing-domain

If not, please let us know.

{
   "entryname": name to be added in the navigation item when the usecase is created/added ,
   "appid": (number) possible values [175 (ipython notebboks), 6 (external html)],
   "contenttype": (string) possible values ["x-ipynb+json", "text/html"],
   "extension": (string) extension with "." like ".ipynb",
   "file": UUID of the file in collab storage (more information see below) OR raw file URL,
   "file_prod": (optional*) Github file url using API,
   "initial": (boolean) if true this nav item will be shown when redirect to collab,
   "justcopy": (boolean) if true, it will avoid creating a nav item,
}

and

To get the UUID in collab
Go to the collab storage where the file is located.
Click on the file and the URL bar will change from something like: https://collab.humanbrainproject.eu/#/collab/<collab_number>/nav/<nav_number>" to https://collab.humanbrainproject.eu/#/collab/<collab_number>/nav/<nav_number>?state=uuid%3D915417d1-359f-4eab-bcb1-a0881dea8d7d so now we have to take the last part after "state=uuid%3D" from the URL. Like: 915417d1-359f-4eab-bcb1-a0881dea8d7d

Please point out what is missing unclear etc.

StefanGIT commented 4 years ago

Finally i found how to get the UUID, as i understand i can only add this and not the file since it is not yet in the repository. May be this could be more explicit.

alex4200 commented 4 years ago

@StefanGIT I have an import error in the second cell (use case "Compare the electrostatic potentials surrounding a set of protein isoforms with multipipsa"). The error reads

Importing python modules failed.
    There is a problem with the python environment!
cannot import name '__html_manager_version__' 

Which does not tell me where this error happened. I strongly suggest to remove this try/except cause:

  1. It is more clear to the user what import did fail.
  2. The notebook execution will stop. Otherwise, if the user selected to run all the cells there will be an error later (because of some package not imported), and debugging will be much harder.

BTW: Error is for import nglview

StefanGIT commented 4 years ago

@alex4200 i have removed the exception catch on all three notebooks. But it is not clear why nglview cannot be loaded. It works when i run it? Could it be a dependency problem with other packages/versions in different environments?

alex4200 commented 4 years ago

I am retrying again, maybe my setup was mixed up.

alex4200 commented 4 years ago

This seems to work, but later I get an error when I am asked related to installing a package (I answered 'n', it is not clear to me what is going on in that cell). Error:

Installing required R-libraries into a user directory
In case you get asked answer 'y', also in case you get asked 
to create the directory.
To avoid this, create a file called '.Renviron' in your home
directory and add 'R_LIBS_USER=~/.R/lib' in this file
/opt/conda/lib/python3.6/site-packages/multipipsa/data/pipsa/bin/npotsim -pg /opt/conda/lib/python3.6/site-packages/multipipsa/data/pipsa/bin/2potsim_skin_spheresU -fp /home/jovyan/work/wholePIPSA -fn names -pa spheres -lg sims.log -pr 3 -sk 4
Would you like to use a personal library instead?  (y/n) n

---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/multipipsa/clusterpipsa.py in importfromR(self, packname)
     54         try:
---> 55             rpack = rpackages.importr(packname)
     56         except Exception:

/opt/conda/lib/python3.6/site-packages/rpy2/robjects/packages.py in importr(name, lib_loc, robject_translations, signature_translation, suppress_messages, on_conflict, symbol_r2python, symbol_check_after, data)
    452                               _system_file(package = rname)):
--> 453         env = _get_namespace(rname)
    454         version = _get_namespace_version(rname)[0]

RRuntimeError: Error in loadNamespace(name) : there is no package called ‘heatmap3’

During handling of the above exception, another exception occurred:

RRuntimeError                             Traceback (most recent call last)
<ipython-input-10-74b1f8b0c85f> in <module>()
      7 pipsaCalc.runClusterPipsa(structures=structures,
      8                           points=[],
----> 9                           cluster=cluster)

/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in runClusterPipsa(self, structures, cluster, points)
    635         if points is None:
    636             points = self.__pipsaStructure.getCAAtoms()
--> 637         self.runPipsa(structures, points=points)
    638 #        residueNumber = 1
    639 #        for p in self.__pipsaStructure.getCAAtoms():

/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in runPipsa(self, structures, points)
    684         if self.__cluster is not None:
    685             self.__cluster.clusterSingleRun(points=points, similarityType=SimilarityType.HODGKIN,
--> 686                                             pipsaLog="sims"+filesuffix+".log")
    687 #        cmdPipsa2R = ["perl", self.__pipsa_root + "/aux/pipsa2R.pl",
    688 #                      "-s", "sims.log", "-t", "h", "-m", pictFile,

/opt/conda/lib/python3.6/site-packages/multipipsa/clusterpipsa.py in clusterSingleRun(self, points, similarityType, pipsaLog, clusterType)
    129         graphics = self.importfromR('graphics')
    130         stats = self.importfromR('stats')
--> 131         heatmap3 = self.importfromR('heatmap3')
    132 #        fpc = rpackages.importr('fpc')
    133         for i in range(0, self.__number_of_points):

/opt/conda/lib/python3.6/site-packages/multipipsa/clusterpipsa.py in importfromR(self, packname)
     57             utils = rpackages.importr("utils")
     58             # , contriburl = R_REPO)
---> 59             utils.install_packages(packname, dependencies=True,repos="https://cloud.r-project.org")
     60             rpack = rpackages.importr(packname)
     61         return rpack

/opt/conda/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
    176                 v = kwargs.pop(k)
    177                 kwargs[r_k] = v
--> 178         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
    179 
    180 pattern_link = re.compile(r'\\link\{(.+?)\}')

/opt/conda/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
    104         for k, v in kwargs.items():
    105             new_kwargs[k] = conversion.py2ri(v)
--> 106         res = super(Function, self).__call__(*new_args, **new_kwargs)
    107         res = conversion.ri2ro(res)
    108         return res

RRuntimeError: Error in (function (pkgs, lib, repos = getOption("repos"), contriburl = contrib.url(repos,  : 
  unable to install packages
StefanGIT commented 4 years ago

This installs a R package that is not installed before. At the top you get the information: "In case you get asked answer 'y', also in case you get asked to create the directory." Therefore one needs to say 'y' at this point.

alex4200 commented 4 years ago

Ah this is not a question in the notebook, this is a question from the install process itself?

In that case it is not user friendly and not intuitive!

StefanGIT commented 4 years ago

Yes, the better option would be to install this package already in R on the OS level... This involves the following packages: grDevices, heatmap3,stats ,graphics,cluster I guess except for the heatmap3 they are part of the base R installation.

StefanGIT commented 4 years ago

I try to make a commandline to install it directly in R, similar to the pip commandlines.

StefanGIT commented 4 years ago

I have added the commands: ! wget https://cran.r-project.org/src/contrib/heatmap3_1.1.6.tar.gz ! R CMD INSTALL heatmap3_1.1.6.tar.gz at the top of the notebooks. This will install the required heatmap.3 package. I hope this are all packages that are not by default alreay in R. Still multipipsa gives out the message and checks if the package is there (see below) but normally there should be no further use internvention needed.

def importfromR(self, packname): try: rpack = rpackages.importr(packname) except Exception: utils = rpackages.importr("utils")

, contriburl = R_REPO)

        utils.install_packages(packname, dependencies=True,repos="https://cloud.r-project.org")
        rpack = rpackages.importr(packname)
    return rpack
StefanGIT commented 4 years ago

The notebook previously converted to python3, that was already online: https://collab.humanbrainproject.eu/#/collab/50197/nav/350840 (UUID: 7b372de8-cc8d-4311-8f4e-1dc8a3470c92 ) was slightly modified as well, to use the same package versions as the new notebooks. This to avoid that packages are removed and reinstalled all the time.

alex4200 commented 4 years ago

@StefanGIT I get an error in the notebook "Identify potential protein binding sites of a set of protein isoforms":

running:/opt/conda/lib/python3.6/site-packages/multipipsa/data/pipsa/bin/2potsim_skin_spheresNN -g1 AC1.grd -g2 AC5.grd -p1 AC1.pdb -p2 AC5.pdb -pa spheres -pr 3.000 -sk 4.000
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-4015d9cfdec6> in <module>()
----> 1 pipsaCalc.runBindingScorePipsa(ingrp, outgrp)
/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in runBindingScorePipsa(self, ingroup, outgroup)
    652 #            allProteins = ingroup + outgroup
    653 #            self.runClusterPipsa(allProteins, cluster = cluster)
--> 654         self._runPipsaWithinGroup(ingroup)
    655         self._runAcrossGroup(ingroup, outgroup)
    656 
/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in _runPipsaWithinGroup(self, ingroup)
    602         """
    603         for x in it.combinations(ingroup, 2):
--> 604             r = self._runPipsaComparison(x[0], x[1])
    605             self.__pipsaResult.pushIngroupResult(x[0], x[1], r)
    606 
/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in _runPipsaComparison(self, pr1, pr2)
    593                 r[4, count] = float(oList[11])  # points in common
    594             count = count+1
--> 595         return np.nan_to_num(r, copy=True)
    596 
    597     def _runPipsaWithinGroup(self, ingroup):
TypeError: nan_to_num() got an unexpected keyword argument 'copy'

It might be related to numpy; the default version of numpy is 1.11.2. Can you please have a look?

StefanGIT commented 4 years ago

Thanks for the hint with numpy, yes the copy function requires numpy version 1.16. I have added this as requirement and use pip install to install the required version. Actually, when i ran the notebook i got Requirement already satisfied: numpy in /opt/conda/lib/python3.6/site-packages (from multipipsa==4.0.10) (1.18.1) Is the production run in a different home environment? I have added the requirement to all three notebooks: https://collab.humanbrainproject.eu/#/collab/50197/nav/530508 https://collab.humanbrainproject.eu/#/collab/50197/nav/530509 https://collab.humanbrainproject.eu/#/collab/50197/nav/530510

alex4200 commented 4 years ago

@StefanGIT I get an error message

* installing to library ‘/usr/local/lib/R/site-library’
Error: ERROR: no permission to install to directory ‘/usr/local/lib/R/site-library’
StefanGIT commented 4 years ago

Sorry, but it really looks like the installations on the test and production system are somewhat different. I have now installed the R library the following way in all three notebooks: ! mkdir -p ~/.R/lib ! grep -qxF 'R_LIBS_USER=~/.R/lib/' ~/.Renviron || echo 'R_LIBS_USER=~/.R/lib' >> ~/.Renviron ! wget -c https://cran.r-project.org/src/contrib/heatmap3_1.1.6.tar.gz ! R CMD INSTALL -l ~/.R/lib heatmap3_1.1.6.tar.gz

Hope this resolves the issue...

alex4200 commented 4 years ago

@StefanGIT For the last command I get an error now

ERROR: dependency ‘fastcluster’ is not available for package ‘heatmap3’
* removing ‘/home/jovyan/.R/lib/heatmap3’
StefanGIT commented 4 years ago

Fastcluster will now also be installed by the R procedure at front in all three notebooks. Since it looks like i run these tests in a different environment, where i do not get these errors, i need to apologize for the incremental approach.

alex4200 commented 4 years ago

@StefanGIT Yes, each user has its own environment. So when you have installed something in the past, is is installed, but just for the user (you).

I suggest to write to support@humanbrainproject.eu as ask for a "container reset". Then your user container will be reset to the standard setup. You need to mention your User ID (In the Collab: Click on your profile image on the top right, then go to profile, check your User ID). After that you will see all the issue there still might be.

Also there is an --upgrade missing for the numpy upgrade:

! pip install --upgrade numpy>=1.16

And for the notebooks as they are I get an error in the second cell:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-bfe6bccef2a2> in <module>()
      7 import rpy2
      8 import os, wget, datetime, magic, inspect
----> 9 from multipipsa.multipipsa import PipsaRun, ApbsRun
     10 from multipipsa.clusterpipsa import ClusterPipsa
     11 from multipipsa.pipsatypes import DistanceType

/opt/conda/lib/python3.6/site-packages/multipipsa/multipipsa.py in <module>()
     21 import warnings
     22 from pprint import pformat
---> 23 from multipipsa.clusterpipsa import ClusterPipsa
     24 from multipipsa.pipsatypes import ScoreType
     25 from multipipsa.pipsatypes import SimilarityType

/opt/conda/lib/python3.6/site-packages/multipipsa/clusterpipsa.py in <module>()
     11 import rpy2.robjects.packages as rpackages
     12 #import rpy2.robjects.numpy2ri
---> 13 from rpy2.robjects import pandas2ri
     14 
     15 from rpy2.robjects.packages import importr

/opt/conda/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py in <module>()
     10                              INTSXP)
     11 
---> 12 from pandas.core.frame import DataFrame as PandasDataFrame
     13 from pandas.core.series import Series as PandasSeries
     14 from pandas.core.index import Index as PandasIndex

/opt/conda/lib/python3.6/site-packages/pandas/__init__.py in <module>()
     54 
     55 # define the testing framework
---> 56 import pandas.util.testing
     57 from pandas.util.nosetester import NoseTester
     58 test = NoseTester().test

/opt/conda/lib/python3.6/site-packages/pandas/util/testing.py in <module>()
     20 
     21 from numpy.random import randn, rand
---> 22 from numpy.testing.decorators import slow     # noqa
     23 import numpy as np
     24 

ModuleNotFoundError: No module named 'numpy.testing.decorators'

Maybe numpy 1.18.1 is too new? numpy 1.16.1 also does not work, I get even a different error

Please ask for your container to be reset, then fix the notebook, test again, and then let me know.

StefanGIT commented 4 years ago

@alex4200 I have modified the multipipsa notebooks to use a specific numpy and pandas version. I have checked them on a clean container. Also I have added a note at the top, that running all cells at once might not show the graphics from nglview.

Could you check them again in production environment?

alex4200 commented 4 years ago

@StefanGIT Retest of the notebooks worked fine, I greenlit the merge request to the next stage (will be next week).

antonelepfl commented 4 years ago

This should be merged. @alex4200 Please confirm that.

antonelepfl commented 4 years ago

This UC will be maintained on SGA3

antonelepfl commented 4 years ago

Already in Prod.