jmrohwer commented 1 year ago

I've fixed a number of bugs (one of them quite severe) and added a couple of features. I'd like to cut a new release that makes these generally available, specifically to students in my lab. The following can serve as a release announcement (edits welcome).

PySCeS Version 1.1.1

We are pleased to announce the release of the Python Simulator for Cellular Systems: PySCeS (https://pysces.github.io/) version 1.1.1. This is the first bugfix release in the 1.1 series.

New features

When used in a notebook environment, the ipympl backend for matplotlib is now enabled if installed. This allows use in JupyterLab (as opposed to classic notebook). If ipympl is not installed, fallback is to the standard nbAgg, which is part of matplotlib.
Simulation results (mod.sim object) are now returned as a pandas DataFrame if pandas is installed, otherwise a numpy recarray is returned as before.

Bug fixes

Fixed a bug in simulations with RateRules using Assimulo, where a wrong solver variable was being assigned internally.
Fixed SBML export when assignment rules were evaluating reaction rates.
Enabled assignment rules (forcing functions) to track parameter changes when using CVODE. This is needed in case events change parameters during the course of the simulation.

README: https://github.com/PySCeS/pysces/blob/master/README.md

DOCUMENTATION: https://pyscesdocs.readthedocs.io/en/latest/

jmrohwer commented 1 year ago

@bgoli have you had a chance to look at this?

bgoli commented 1 year ago

Looks interesting, I have time from today.

On Sat, Jul 8, 2023 at 3:14 PM Johann Rohwer @.***> wrote:

@bgoli https://github.com/bgoli have you had a chance to look at this?

— Reply to this email directly, view it on GitHub https://github.com/PySCeS/pysces/pull/91#issuecomment-1627263785, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGHUEOKT5B5RY7XGYNTDVLXPFMJ7ANCNFSM6AAAAAAZ2CBYGY . You are receiving this because you were mentioned.Message ID: @.***>

bgoli commented 1 year ago

Other than the Pandas thing everything else looks good to go

jmrohwer commented 1 year ago

Copying email discussion:

I see 3 options currently:

add a dependency on pandas and make mod.sim a DataFrame (DF)
make mod.sim a DF if pandas is installed, if it is not installed raise an error (this would not be backwards compatible)
leave mod.sim a numpy recarray (as is currently) and create a new object (also with lazy loading), e.g. mod.simdf, that returns a DF. This has the advantage of being backwards compatible but returns two objects with essentially the same functionality. In addition, when doing an interactive session, mod.simdf is more cumbersome to type than mod.sim yet mod.simdf is the one which I would use all the time.

Alternatively, if we can't agree immediately on a way forward re. this issue, I could remove the relevant commit from the PR and do a new release with the rest. I would ideally, however, like to get everything wrapped up to avoid doing a subsequent release very soon. And this DF thing is just such a handy feature! Once you've worked with it, the DF functionality is just so much more powerful, incl. tab completion for model attributes when working interactively. We (i.e. I and my students) really need this and would not want to go back.

jmrohwer commented 1 year ago

Copying @bgoli email reply:

I see that a pandas dataframe is useful for interactive work but it does increase the dependencies that are needed by Pandas to install PySCeS and is just overhead for any other workflow.

Changing the type of an attribute based only on if a module is installed or not, is not really a good option. However, Pandas support is interesting for interactive work, but, then so is maybe Zarr for HPC workflows etc. So what about something like this?

We create an alternate datatype (or something) config key with the default of None.

mod.__config__[‘alternate_datatype’] = None

and then a basic Pandas enable method (to start with) method:

def enableDataPandas(self, var=True):
  if var:
    try:
      import pandas
      self.__config__[‘alternate_datatype’] = ‘pandas’
    except ImportError:
      print(‘Instructions to install panda’)
      return false
    return true
  else:
    self.__config__[‘alternate_datatype’] = None
    return false

Now for mod.sim - and any other variable that might be useful to the user as a dataframe - you can use Pandas simply by adding something like:

if self.__config__[‘alternate_datatype’] == ‘pandas’:
  print(“variable type changed to pandas and whatever else must happen here”)
  return dataframe
else:
  print(“variable type changed back to numpy and whatever else must happen here”)
  return numpy.rec #or whatever

The advantage of going this route is that we can have multiple custom datatype outputs with the default being numpy. Of course that means you would have to type mod.enableDataPandas() once in a script, is that too much extra typing?

Would something like this work, it is more flexible as we can potentially include other formats (e.g. Zarr) for other workflows etc?

jmrohwer commented 1 year ago

So as I'm seeing it both our proposals can return two different types for mod.sim, but the difference is:

yours is as a result of a config key
mine would be as a result of testing for the presence of a module

I agree that the additional overhead of pandas and its dependencies is not warranted to make it a general dependency of PySCeS. So far we are on the same page.

As to the use case of mod.sim (and e.g. mod.scan):

these are instantiated by lazy loading so only created if needed so create no unnecessary overhead
their use will generally be interactive (but not strictly necessary so)
any operations on attributes such as mod.sim.S1 or mod.sim.Time (e.g. using them in matplotlib plotting commands) will work equally well for a pandas DF or a numpy recarray so that should not break anything
if one has pandas installed I can think of no reason whatsoever to not use a DF vs. a recarray

But the argument about multiple datatype outputs such as Zarr is a valid one though, you have a point and I see the benefit of a configuration option. However:

I would prefer making the configurable an attribute of the pysces module that is imported (i.e. global setting) that can be set via .pyscfg.ini, and not a per-model setting. I can see myself using pandas all the time and even if it is typing mod.enableDataPandas() only once per script, it would be a PITA not being able to set this as default behaviour. Also what about doing an analysis with 4 or 5 different models in the same session, it would then have to be set separately for every model instance.
I would move the print statements to the enabling function, otherwise we have this endless spitting out of output every time mod.sim is called.

Would that be acceptable?

bgoli commented 1 year ago

We're on the same page, to answer your questions:

I would prefer making the configurable an attribute of the pysces module t

Yes absolutely, I also had this in mind when thinking about using a config option :-) I would like to keep the enablePandas method as it allows scripts to be slightly more reusable in that you can include the enable method explicitly to avoid any possible strange errors if people don't have Pandas installed. Practically, the global config option could just make it so this method is automatically called on model instantiation. I'm sure we already do similar stuff with some model config options so it should be easy to add (I'm happy to try if you like).

I would move the print statements to the enabling function ...

My apologies that is me taking shortcuts, those print statements were supposed to represent the code being called in those blocks and should absolutely not exist anywhere in real life.

If we agree this is okay then to me it sounds like we are good to go 👍

bgoli commented 1 year ago

In addition we could eventually also add a module level enablePandas() function that sets the global config option for a session and this is then set automatically in any instantiated model.

jmrohwer commented 1 year ago

@bgoli from my side this is now good to go. Can you confirm? I will of course update the CI stuff again for the release, did not want to build all the permutations now for testing.

bgoli commented 1 year ago

Thanks for the test notebook, however, it fails on the Panda import ... any ideas?

Could this be a windows issue in which case I can look into it

jmrohwer commented 1 year ago

It is not a windows issue per se as I've tested on Windows as well. Is this a development (editable) install and have you pulled the latest version? Perhaps related to this comment? I was getting exactly the same before putting in the extra import.

bgoli commented 1 year ago

Have you tested it without the initial auto import from your config? I've experienced this before and usually the only way I can get it to work in a terminal with a dynamic (runtime) import is to set a reference to the module with a private class attribute.

I can commit my changes but essentially all I've done is add to

'''python

def __init__(self, args):
    # existing code
    if  __CUSTOM_DATATYPE__ == 'pandas':
        if _checkPandas():
            self.enableDataPandas(True)
        else:
            self.enableDataPandas(False)

''' and modified enableDataPandas

'''python

def enableDataPandas(self, var=True):
    if var:
        try:
            import pandas as pandas
            self.__pandas = pandas
            self.__settings__['custom_datatype'] = 'pandas'
            print('Pandas output enabled.')
        except ModuleNotFoundError:

'''

this creates a nice private attribute that can be used for development '''python

self.__pandas ''' which as far as I can tell always works no matter what the OS, etc.

What do you thing, should I push it and you can test linux, etc?

jmrohwer commented 1 year ago

Indeed I had not tested it if my configuration file was set to custom_datatype = None. I also get the same error on Linux when doing this. Can't say I understand why. Feel free to commit your changes (then there's a commit from you again as well :grin:) then I'll test on my side.

jmrohwer commented 1 year ago

Are you happy otherwise if I merge the PR if this works?

bgoli commented 1 year ago

The reasons are obscure and have to do with scoping and I think that a module is only ever imported once ... or something. Let me commit this and then if it works on your side go for the release.

bgoli commented 1 year ago

Happy for you to merge at your convenience 👍

jmrohwer commented 1 year ago

Great, just running final tests on my side :+1:

PySCeS / pysces

Proposed release 1.1.1 #91

PySCeS Version 1.1.1

New features

Bug fixes