Open GeigerJ2 opened 1 week ago
Thanks @GeigerJ2, just putting the Discourse topic here for reference:
https://aiida.discourse.group/t/should-we-load-the-default-profile-automatically/407
Two points off the top of my head:
allow_switch
isn't set to True
by default?Thanks for linking the Discourse thread, @mbercx. Fair enough. I didn't find a LOGGER
in the file, so just went without it :D It's now added in my recent commit.
Not sure about the default for allow_switch
, and, in general, the backend magic that is done when switching profile...
Thanks @GeigerJ2 . I think it is worth discussing having an option to no longer force users to have to run load_profile()
manually in certain use-cases, but not sure the current solution of having Manager.get_profile_storage()
do this is the right place. I have to think about it a bit more, but I would rather be tempted to have it as part of the %aiida
notebook magic. I am not quite sure yet if it is ok to have the default profile loaded always.
Wondering if
get_default_profile
should be underaiida.cmdline.utils.defaults
or moved out ofcmdline
in a more general location?
You don't need to determine the default manually though. If you look at the docstring of Manager.load_profile
which you call, it will automatically use the default.
I'm wondering why allow_switch isn't set to True by default?
We set it to False
when we introduced this feature. For the longest time you couldn't switch profiles becaause it was impossible to change database backends. This was mostly a Django problem. When we made it possible at some point, we weren't a 100% sure that there wouldn't be subtle bugs hiding, so we made it an explicit opt-in.
Thanks for your comments @sphuber!
I am not quite sure yet if it is ok to have the default profile loaded always.
What is the source of your apprehension here? If users run verdi shell
, the default profile is also loaded automatically, no? Why then is this problematic in a Jupyter notebook?
Re the profile switching: I've done this quite often without issues, although admittedly that doesn't mean there can't still be bugs. One issue with switching profiles is of course that you no longer have access to data that was assigned to variables based on the previous profile, e.g.:
In [1]: structure = load_node(1234)
In [2]: from aiida import load_profile
In [3]: load_profile('dev', allow_switch=True)
Out[3]: Profile<uuid='f9fdf6861b4d47f989e87ad3cbc4e715' name='dev'>
In [4]: structure
The last line will raise a ClosedStorage
error:
ClosedStorage: Storage for 'prod' [closed] @ postgresql://aiida:***@localhost:5432/aiida-lumi_prod_2 / DiskObjectStoreRepository: 0500319a13bb4c7ca70ecc15e0375205 | /hith/aiida-lumi/repositories/prod/container
What is the source of your apprehension here? If users run verdi shell, the default profile is also loaded automatically, no? Why then is this problematic in a Jupyter notebook?
I don't have a problem with the Jupyter notebook. I would argue that instead we should just explicitly auto-load it in that very context. What I don't like about the current solution is that it is always loaded, even when using the API in a context that is not really used interactively by a user. For example some other client code that wants to use AiiDA's API. You don't always want the default profile to be loaded. This is a user specific thing, so we should enable that in user-specific contexts, e.g. verdi shell
and in Jupyter notebooks.
Re the profile switching: I've done this quite often without issues, although admittedly that doesn't mean there can't still be bugs.
Another reason why I would argue for a more targeted solution. In the specific use case of Jupyter notebook, when it is started, a profile won't have been loaded yet, so you wouldn't even need to toggle this switch. Just toggling the switch by default all the way down in Manager.get_profile_storage
just for this use case is way too heavy handed.
Hi guys, thanks a lot for getting back here! Yes, I agree it's something that is worth discussing/considering. Initially, I was thinking that it'd be nice to have it always on, but your argumentation, @sphuber, has convinced me that we better put it into the %aiida
notebook magic rather than always loading it, which indeed seems too heavy. For now, I just put it into get_manager()
as that is where the Traceback led me to. I'll look into moving the implementation to the %aiida
notebook magic 🙏🏽
You don't need to determine the default manually though. If you look at the docstring of Manager.load_profile which you call, it will automatically use the default.
Ah, interesting, thanks for the pointer!
PS: Generally, I was in favor of always loading it, as I usually don't use jupyter notebooks for running stuff, but normal .py
files and the character sequence #%%
which turns code in between it into executable notebook cells in VS Code; to avoid the .ipynb
json gibberish for version control. I see how that's a niche case though, and most people use actual jupyter notebooks for that kinda stuff, and always loading might be too heavy :D
Generally, I was in favor of always loading it, as I usually don't use jupyter notebooks for running stuff, but normal
.py
files
Note that for this use case, there is the runaiida
executable that automatically loads the default profile. So you can define that as the default interpreter in your .py
scripts and you will have the same behavior. For example:
#!/usr/bin/env runaiida
from aiida import get_profile
print(get_profile())
added to a script, and then made executable, will yield:
$ ./scripts/test_bin.py
Profile<uuid='ed155f9682c84b399da06b11c3c38a71' name='main'>
What I don't like about the current solution is that it is always loaded, even when using the API in a context that is not really used interactively by a user.
Though, not sure that I understand the issue here. The code is only called if the user would otherwise also have to manually run load_profile()
, so they would need to add that in their scripts either way? If there's a case where they wouldn't actually require it, the code wouldn't be reached, though, I might be missing something. For me, the main issue is that it seems a bit pointless to force users to run load_profile()
without any argument (running it with an argument to load a specific profile, of course, is a different story).
I'm not sure how to add the functionality so that it's specific for jupyter notebooks. I wouldn't make it part of the AiiDA notebook magic, because if people have to run:
%load_ext aiida
%aiida
then it defeats the purpose anyway. Will have a look how it's done for runaiida
and verdi shell
👍🏽
Note that for this use case, there is the runaiida executable that automatically loads the default profile. So you can define that as the default interpreter in your .py scripts and you will have the same behavior. For example:
That's very useful to know, thanks! Been actually wondering before what's the point of that, when I saw it ^^
Though, not sure that I understand the issue here. The code is only called if the user would otherwise also have to manually run load_profile(), so they would need to add that in their scripts either way? If there's a case where they wouldn't actually require it, the code wouldn't be reached, though, I might be missing something.
The problem is that Manager.get_profile_storage
gets called automatically when the storage backend is implicitly requested, not just when you call load_profile
. For example:
$ ipython
Python 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from aiida.orm import Int
In [2]: Int()
---------------------------------------------------------------------------
ConfigurationError Traceback (most recent call last)
Cell In[4], line 1
----> 1 Int()
File ~/code/aiida/env/dev/aiida-core/src/aiida/orm/nodes/data/base.py:42, in BaseType.__init__(self, value, **kwargs)
39 except AttributeError:
40 raise RuntimeError('Derived class must define the `_type` class member')
---> 42 super().__init__(**kwargs)
44 self.value = value or self._type()
File ~/code/aiida/env/dev/aiida-core/src/aiida/orm/nodes/data/data.py:60, in Data.__init__(self, source, *args, **kwargs)
58 def __init__(self, *args, source=None, **kwargs):
59 """Construct a new instance, setting the ``source`` attribute if provided as a keyword argument."""
---> 60 super().__init__(*args, **kwargs)
61 if source is not None:
62 self.source = source
File ~/code/aiida/env/dev/aiida-core/src/aiida/orm/nodes/node.py:266, in Node.__init__(self, backend, user, computer, **kwargs)
259 def __init__(
260 self,
261 backend: Optional['StorageBackend'] = None,
(...)
264 **kwargs: Any,
265 ) -> None:
--> 266 backend = backend or get_manager().get_profile_storage()
268 if computer and not computer.is_stored:
269 raise ValueError('the computer is not stored')
File ~/code/aiida/env/dev/aiida-core/src/aiida/manage/manager.py:257, in Manager.get_profile_storage(self)
255 profile = self.get_profile()
256 if profile is None:
--> 257 raise ConfigurationError(
258 'Could not determine the current profile. Consider loading a profile using `aiida.load_profile()`.'
259 )
261 # request access to the profile (for example, if it is being used by a maintenance operation)
262 ProfileAccessManager(profile).request_access()
ConfigurationError: Could not determine the current profile. Consider loading a profile using `aiida.load_profile()`.
My point is that it is not always obvious exactly how the storage backend can be requested and if we change the behavior that the default profile gets automatically loaded, that may break things.
So although ultimately it might be fine to automatically load it always, I am first proposing that we try to limit the scope and only do it their where it is really needed, i.e., we reduce the surface area we impact.
Will have a look how it's done for runaiida and verdi shell
runaiida
essentially just calls verdi run
and that (like verdi shell
) has a --profile
option which automatically loads the default profile before running the command. This is exactly the solution I would be looking for, because now you have an entry point where in most cases the user does want to load the default profile automatically.
I'm not sure how to add the functionality so that it's specific for jupyter notebooks.
I have to admit that I am also not quite sure how this can be done fully automatically. I would indeed include it in the %aiida
magic. But I am not so sure that it is really all that bad. It is useful in general as it doesn't just load the default profile, but will also import some typically used modules and utilities, like load_node
etc. So maybe that is fine still as a solution?
The problem is that Manager.get_profile_storage gets called automatically when the storage backend is implicitly requested, not just when you call load_profile. For example:
The traceback you provide was also exactly what I had in mind, where it would be called.
My point is that it is not always obvious exactly how the storage backend can be requested and if we change the behavior that the default profile gets automatically loaded, that may break things.
Yeah, there might certainly be cases that I'm overlooking.
So although ultimately it might be fine to automatically load it always, I am first proposing that we try to limit the scope and only do it their where it is really needed, i.e., we reduce the surface area we impact.
Sure, that makes sense! Fine for me to shelve the implementation of this PR for now for security. Thanks for the explanation on runaiida
!
With "it defeats the purpose", I meant that if the user has to run:
%load_ext aiida
%aiida
rather than
from aiida import load_profile
load_profile()
they still need to add two lines of code, so it's kinda the same situation. Though, if %aiida
provides additional advantages like automatically loading load_node
and other utilities, we could recommend that instead. I remember I used that today, and iirc, when using %aiida
, load_profile()
isn't required anyway, so we might not even have to change anything there. Will check again tomorrow.
Yeah, so
%load_ext aiida
%aiida
already automatically loads the profile. If we decide to add this feature there, then we can consider it done already :D
Idea originally brought up by @mbercx. I thought this should be quick to do, so I just went ahead with it.
I can see how this could possibly lead to issues, as the profile is loaded silently in the background, without the user specifying it, so people might end up accidentally working within the wrong profile. Nonetheless, we can evaluate if it's an acceptable risk considering the gained benefit that one doesn't have to manually run
load_profile
every time. So, happy to discuss :)PS: Wondering if
get_default_profile
should be underaiida.cmdline.utils.defaults
or moved out ofcmdline
in a more general location?Edit: Also running
load_profile("<profile>")
leads to:though, this can be easily resolved by setting
allow_switch=True
.