aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
411 stars 184 forks source link

Auto-load default AiiDA profile #6485

Open GeigerJ2 opened 1 week ago

GeigerJ2 commented 1 week ago

Idea originally brought up by @mbercx. I thought this should be quick to do, so I just went ahead with it.

I can see how this could possibly lead to issues, as the profile is loaded silently in the background, without the user specifying it, so people might end up accidentally working within the wrong profile. Nonetheless, we can evaluate if it's an acceptable risk considering the gained benefit that one doesn't have to manually run load_profile every time. So, happy to discuss :)

PS: Wondering if get_default_profile should be under aiida.cmdline.utils.defaults or moved out of cmdline in a more general location?

Edit: Also running load_profile("<profile>") leads to:

InvalidOperation: cannot switch to profile "<profile>" because profile "<default_profile>" storage is already loaded and allow_switch is False

though, this can be easily resolved by setting allow_switch=True.

mbercx commented 1 week ago

Thanks @GeigerJ2, just putting the Discourse topic here for reference:

https://aiida.discourse.group/t/should-we-load-the-default-profile-automatically/407

Two points off the top of my head:

  1. I wouldn't load the profile silently, i.e. I would still print that the profile has been loaded.
  2. I'm wondering why allow_switch isn't set to True by default?
GeigerJ2 commented 1 week ago

Thanks for linking the Discourse thread, @mbercx. Fair enough. I didn't find a LOGGER in the file, so just went without it :D It's now added in my recent commit.

Not sure about the default for allow_switch, and, in general, the backend magic that is done when switching profile...

sphuber commented 1 week ago

Thanks @GeigerJ2 . I think it is worth discussing having an option to no longer force users to have to run load_profile() manually in certain use-cases, but not sure the current solution of having Manager.get_profile_storage() do this is the right place. I have to think about it a bit more, but I would rather be tempted to have it as part of the %aiida notebook magic. I am not quite sure yet if it is ok to have the default profile loaded always.

Wondering if get_default_profile should be under aiida.cmdline.utils.defaults or moved out of cmdline in a more general location?

You don't need to determine the default manually though. If you look at the docstring of Manager.load_profile which you call, it will automatically use the default.

I'm wondering why allow_switch isn't set to True by default?

We set it to False when we introduced this feature. For the longest time you couldn't switch profiles becaause it was impossible to change database backends. This was mostly a Django problem. When we made it possible at some point, we weren't a 100% sure that there wouldn't be subtle bugs hiding, so we made it an explicit opt-in.

mbercx commented 5 days ago

Thanks for your comments @sphuber!

I am not quite sure yet if it is ok to have the default profile loaded always.

What is the source of your apprehension here? If users run verdi shell, the default profile is also loaded automatically, no? Why then is this problematic in a Jupyter notebook?

Re the profile switching: I've done this quite often without issues, although admittedly that doesn't mean there can't still be bugs. One issue with switching profiles is of course that you no longer have access to data that was assigned to variables based on the previous profile, e.g.:

In [1]: structure = load_node(1234)

In [2]: from aiida import load_profile

In [3]: load_profile('dev', allow_switch=True)
Out[3]: Profile<uuid='f9fdf6861b4d47f989e87ad3cbc4e715' name='dev'>

In [4]: structure

The last line will raise a ClosedStorage error:

ClosedStorage: Storage for 'prod' [closed] @ postgresql://aiida:***@localhost:5432/aiida-lumi_prod_2 / DiskObjectStoreRepository: 0500319a13bb4c7ca70ecc15e0375205 | /hith/aiida-lumi/repositories/prod/container
sphuber commented 5 days ago

What is the source of your apprehension here? If users run verdi shell, the default profile is also loaded automatically, no? Why then is this problematic in a Jupyter notebook?

I don't have a problem with the Jupyter notebook. I would argue that instead we should just explicitly auto-load it in that very context. What I don't like about the current solution is that it is always loaded, even when using the API in a context that is not really used interactively by a user. For example some other client code that wants to use AiiDA's API. You don't always want the default profile to be loaded. This is a user specific thing, so we should enable that in user-specific contexts, e.g. verdi shell and in Jupyter notebooks.

Re the profile switching: I've done this quite often without issues, although admittedly that doesn't mean there can't still be bugs.

Another reason why I would argue for a more targeted solution. In the specific use case of Jupyter notebook, when it is started, a profile won't have been loaded yet, so you wouldn't even need to toggle this switch. Just toggling the switch by default all the way down in Manager.get_profile_storage just for this use case is way too heavy handed.

GeigerJ2 commented 5 days ago

Hi guys, thanks a lot for getting back here! Yes, I agree it's something that is worth discussing/considering. Initially, I was thinking that it'd be nice to have it always on, but your argumentation, @sphuber, has convinced me that we better put it into the %aiida notebook magic rather than always loading it, which indeed seems too heavy. For now, I just put it into get_manager() as that is where the Traceback led me to. I'll look into moving the implementation to the %aiida notebook magic 🙏🏽

You don't need to determine the default manually though. If you look at the docstring of Manager.load_profile which you call, it will automatically use the default.

Ah, interesting, thanks for the pointer!

PS: Generally, I was in favor of always loading it, as I usually don't use jupyter notebooks for running stuff, but normal .py files and the character sequence #%% which turns code in between it into executable notebook cells in VS Code; to avoid the .ipynb json gibberish for version control. I see how that's a niche case though, and most people use actual jupyter notebooks for that kinda stuff, and always loading might be too heavy :D

sphuber commented 5 days ago

Generally, I was in favor of always loading it, as I usually don't use jupyter notebooks for running stuff, but normal .py files

Note that for this use case, there is the runaiida executable that automatically loads the default profile. So you can define that as the default interpreter in your .py scripts and you will have the same behavior. For example:

#!/usr/bin/env runaiida
from aiida import get_profile
print(get_profile())

added to a script, and then made executable, will yield:

$ ./scripts/test_bin.py
Profile<uuid='ed155f9682c84b399da06b11c3c38a71' name='main'>
GeigerJ2 commented 4 days ago

What I don't like about the current solution is that it is always loaded, even when using the API in a context that is not really used interactively by a user.

Though, not sure that I understand the issue here. The code is only called if the user would otherwise also have to manually run load_profile(), so they would need to add that in their scripts either way? If there's a case where they wouldn't actually require it, the code wouldn't be reached, though, I might be missing something. For me, the main issue is that it seems a bit pointless to force users to run load_profile() without any argument (running it with an argument to load a specific profile, of course, is a different story).

I'm not sure how to add the functionality so that it's specific for jupyter notebooks. I wouldn't make it part of the AiiDA notebook magic, because if people have to run:

%load_ext aiida
%aiida

then it defeats the purpose anyway. Will have a look how it's done for runaiida and verdi shell 👍🏽

Note that for this use case, there is the runaiida executable that automatically loads the default profile. So you can define that as the default interpreter in your .py scripts and you will have the same behavior. For example:

That's very useful to know, thanks! Been actually wondering before what's the point of that, when I saw it ^^

sphuber commented 4 days ago

Though, not sure that I understand the issue here. The code is only called if the user would otherwise also have to manually run load_profile(), so they would need to add that in their scripts either way? If there's a case where they wouldn't actually require it, the code wouldn't be reached, though, I might be missing something.

The problem is that Manager.get_profile_storage gets called automatically when the storage backend is implicitly requested, not just when you call load_profile. For example:

$ ipython
Python 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from aiida.orm import Int

In [2]: Int()
---------------------------------------------------------------------------
ConfigurationError                        Traceback (most recent call last)
Cell In[4], line 1
----> 1 Int()

File ~/code/aiida/env/dev/aiida-core/src/aiida/orm/nodes/data/base.py:42, in BaseType.__init__(self, value, **kwargs)
     39 except AttributeError:
     40     raise RuntimeError('Derived class must define the `_type` class member')
---> 42 super().__init__(**kwargs)
     44 self.value = value or self._type()

File ~/code/aiida/env/dev/aiida-core/src/aiida/orm/nodes/data/data.py:60, in Data.__init__(self, source, *args, **kwargs)
     58 def __init__(self, *args, source=None, **kwargs):
     59     """Construct a new instance, setting the ``source`` attribute if provided as a keyword argument."""
---> 60     super().__init__(*args, **kwargs)
     61     if source is not None:
     62         self.source = source

File ~/code/aiida/env/dev/aiida-core/src/aiida/orm/nodes/node.py:266, in Node.__init__(self, backend, user, computer, **kwargs)
    259 def __init__(
    260     self,
    261     backend: Optional['StorageBackend'] = None,
   (...)
    264     **kwargs: Any,
    265 ) -> None:
--> 266     backend = backend or get_manager().get_profile_storage()
    268     if computer and not computer.is_stored:
    269         raise ValueError('the computer is not stored')

File ~/code/aiida/env/dev/aiida-core/src/aiida/manage/manager.py:257, in Manager.get_profile_storage(self)
    255 profile = self.get_profile()
    256 if profile is None:
--> 257     raise ConfigurationError(
    258         'Could not determine the current profile. Consider loading a profile using `aiida.load_profile()`.'
    259     )
    261 # request access to the profile (for example, if it is being used by a maintenance operation)
    262 ProfileAccessManager(profile).request_access()

ConfigurationError: Could not determine the current profile. Consider loading a profile using `aiida.load_profile()`.

My point is that it is not always obvious exactly how the storage backend can be requested and if we change the behavior that the default profile gets automatically loaded, that may break things.

So although ultimately it might be fine to automatically load it always, I am first proposing that we try to limit the scope and only do it their where it is really needed, i.e., we reduce the surface area we impact.

Will have a look how it's done for runaiida and verdi shell

runaiida essentially just calls verdi run and that (like verdi shell) has a --profile option which automatically loads the default profile before running the command. This is exactly the solution I would be looking for, because now you have an entry point where in most cases the user does want to load the default profile automatically.

I'm not sure how to add the functionality so that it's specific for jupyter notebooks.

I have to admit that I am also not quite sure how this can be done fully automatically. I would indeed include it in the %aiida magic. But I am not so sure that it is really all that bad. It is useful in general as it doesn't just load the default profile, but will also import some typically used modules and utilities, like load_node etc. So maybe that is fine still as a solution?

GeigerJ2 commented 4 days ago

The problem is that Manager.get_profile_storage gets called automatically when the storage backend is implicitly requested, not just when you call load_profile. For example:

The traceback you provide was also exactly what I had in mind, where it would be called.

My point is that it is not always obvious exactly how the storage backend can be requested and if we change the behavior that the default profile gets automatically loaded, that may break things.

Yeah, there might certainly be cases that I'm overlooking.

So although ultimately it might be fine to automatically load it always, I am first proposing that we try to limit the scope and only do it their where it is really needed, i.e., we reduce the surface area we impact.

Sure, that makes sense! Fine for me to shelve the implementation of this PR for now for security. Thanks for the explanation on runaiida!

With "it defeats the purpose", I meant that if the user has to run:

%load_ext aiida
%aiida

rather than

from aiida import load_profile
load_profile()

they still need to add two lines of code, so it's kinda the same situation. Though, if %aiida provides additional advantages like automatically loading load_node and other utilities, we could recommend that instead. I remember I used that today, and iirc, when using %aiida, load_profile() isn't required anyway, so we might not even have to change anything there. Will check again tomorrow.

GeigerJ2 commented 4 days ago

Yeah, so

%load_ext aiida
%aiida

already automatically loads the profile. If we decide to add this feature there, then we can consider it done already :D