aiidalab / aiidalab-widgets-base

Reusable widgets for AiiDAlab applications
MIT License
6 stars 17 forks source link

Fix threading issue when accessing user instance in _get_codes of ComputationalResourceWidgets #543

Closed unkcpz closed 5 months ago

unkcpz commented 7 months ago

fixes https://github.com/aiidalab/aiidalab-qe/issues/582

The _get_codes method is used in the multi-threading case by QEapp, and causes the issue. Since we assume one AiiDAlab app instance is bound with the default user, there is no need to restrict it again in the code queries. This fix will not change any actual behavior.

Interestingly, the fix also reveals an incorrect test which should fail but somehow not.

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (1acd3ec) 87.12% compared to head (e446b58) 87.12%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #543 +/- ## ======================================= Coverage 87.12% 87.12% ======================================= Files 27 27 Lines 4643 4646 +3 ======================================= + Hits 4045 4048 +3 Misses 598 598 ``` | [Flag](https://app.codecov.io/gh/aiidalab/aiidalab-widgets-base/pull/543/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiidalab) | Coverage Δ | | |---|---|---| | [python-3.10](https://app.codecov.io/gh/aiidalab/aiidalab-widgets-base/pull/543/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiidalab) | `87.12% <100.00%> (+<0.01%)` | :arrow_up: | | [python-3.8](https://app.codecov.io/gh/aiidalab/aiidalab-widgets-base/pull/543/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiidalab) | `87.16% <100.00%> (+<0.01%)` | :arrow_up: | | [python-3.9](https://app.codecov.io/gh/aiidalab/aiidalab-widgets-base/pull/543/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiidalab) | `87.16% <100.00%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiidalab#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

unkcpz commented 7 months ago

This is the part of traceback that lead me to the change, if you remember the change we made from migrate aiida-core v1->v2, that the aiida nodes in the traits are replaced by uuid, the issue https://github.com/aiidalab/aiidalab-qe/issues/582 has the same source of problem.

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jovyan/apps/aiidalab-qe/src/aiidalab_qe/common/setup_codes.py", line 235, in _refresh_installed
    self.set_trait("installed", True)
  File "/opt/conda/lib/python3.9/site-packages/traitlets/traitlets.py", line 1742, in set_trait
    getattr(cls, name).set(self, value)
  File "/opt/conda/lib/python3.9/site-packages/traitlets/traitlets.py", line 721, in set
    obj._notify_trait(self.name, old_value, new_value)
  File "/opt/conda/lib/python3.9/site-packages/traitlets/traitlets.py", line 1505, in _notify_trait
    self.notify_change(
  File "/opt/conda/lib/python3.9/site-packages/ipywidgets/widgets/widget.py", line 694, in notify_change
    super(Widget, self).notify_change(change)
  File "/opt/conda/lib/python3.9/site-packages/traitlets/traitlets.py", line 1517, in notify_change
    return self._notify_observers(change)
  File "/opt/conda/lib/python3.9/site-packages/traitlets/traitlets.py", line 1564, in _notify_observers
    c(event)
  File "/home/jovyan/apps/aiidalab-qe/src/aiidalab_qe/app/submission/__init__.py", line 202, in _auto_select_code
    code_widget.refresh()

The self.set_trait("installed", True) is called from QESetupWidget which installs the QE code is run in another threading to not block the app. Inside the threading, it triggers the trait change because it will run a refresh_codes of computational resource widget after the codes are installed to update the list of dropdown menus.

So if AiiDA still not threading safe, we need to implement a rule for all the widgets development in AiiDAlab that the node should have the local scope not cross the functions to avoid the problem. If the changes I made here are correct, we now know two proper design principles to follow:

  1. Use UUID as trait when developing the new widget.
  2. The user node which seems have a wider scope need to query and use inside the function. (pinning @sphuber for comment, I think this might be a change can happened in aiida-core to provide an API to get the user by query in a function scope session instead like we originally have here which uses user = orm.User.collection.get_default())
danielhollas commented 6 months ago

@sphuber would you mind taking a look if the changes here make sense? I am not super familiar in how aiida-core handles User instances.

sphuber commented 6 months ago

@sphuber would you mind taking a look if the changes here make sense? I am not super familiar in how aiida-core handles User instances.

I had a look but I don't understand what the changes to fetching the default user should do. First of, User.objects is deprecated, and User.collection should be used instead. So the PR changes to use the deprecated version. The current use is correct. To get the default user, User.collection.get_default() is correct. The only tricky part here is that this default is cached on the StorageBackend. So if the default user is changed, this needs to be reset through StorageBackend.reset_default_user. This is done automatically through Manager.set_default_user_email but not if you go directly through Profile.set_default_user_email.

If you want to manually query for a user given an email, I would recommend User.collection.get(email='email'). It is concise and closest in syntax to User.colelction.get_default()

unkcpz commented 6 months ago

@sphuber thanks! Yes, User.collection.get(email='email') looks better, I'll give it a try.

unkcpz commented 6 months ago

Tested, works good! I update the PR. @danielhollas

danielhollas commented 6 months ago

@unkcpz just to double-check. This PR fixes the threading issue that you saw in QeApp and reported at https://github.com/aiidalab/aiidalab-qe/issues/582

(tl;dr, you were seeing sqlalchemy.exc.InvalidRequestError: Instance '<DbUser at 0x7f7f1d3d96d0>' is not persistent within this Session)

The current use is correct. To get the default user, User.collection.get_default() is correct. The only tricky part here is that this default is cached on the StorageBackend.

Oh, the caching then explains our threading issues! That's a really unfortunate side-effect, but now I understand why the fix in this PR works. Even though we were calling get_default inside the function and thought it thus should be thread-safe, it was in fact returning the same object.

One can verify this in verdi shell:

user = User.collection.get_default()
user2 = User.collection.get_default()
id(user) == id(user2)
> True

whereas load_node behaves differently

node = load_node(117419)
node2 = load_node(117419)
id(u) == id(u2)
> False

@unkcpz can you check if we use User.collection.get_default anywhere else in AWB?

I also wonder if User.collection.get_default() could get smarter and use per-thread cache? @sphuber Are there any other instances of caching like this that we should be aware of?

sphuber commented 5 months ago

I also wonder if User.collection.get_default() could get smarter and use per-thread cache? @sphuber Are there any other instances of caching like this that we should be aware of?

AiiDA has never been thread-safe, so I am not sure if we should start making off the cuff fixes for just this. There are plenty of other singletons in the code, such as the loaded Profile, Config and Manager to name just a few.

danielhollas commented 5 months ago

AiiDA has never been thread-safe, so I am not sure if we should start making off the cuff fixes for just this. There are plenty of other singletons in the code, such as the loaded Profile, Config and Manager to name just a few.

That's fair, although I'd argue that there is still a slight difference here. User.collection.get_default() return the user node orm instance (not sure about the terminology here). We now understand well that AiiDA nodes themselved are not threadsafe. However, the fact that some methods may return the same node when called twice is a gotcha we didn't know about, and I don't see a good way guarding against this. So I am more interested in cases like this, and less worried about Profile, Config, Manager thing (but I might be underestimating the difficulties there). I wonder if at least this behaviour should be documented in the docstring somehow?

In general, I think the issue we have in AWB is that we have a coupling between the code that handles the frontend events (i.e. user clicking a ipywidgets button), which generally come from different threads, and backend code that communicates with the DB. This coupling makes it hard to see where the footguns lie.

danielhollas commented 5 months ago

@unkcpz can you check if we use User.collection.get_default anywhere else in AWB?

@unkcpz there are two other instances of get_default() that need to be fixed in get_computers and configure_computers.

@sphuber thanks! Yes, User.collection.get(email='email') looks better, I'll give it a try.

I am not sure this is the correct solution here, feels more like a hack. We really want the default user. What if there are two users with the same email (is that possible?).

sphuber commented 5 months ago

I wonder if at least this behaviour should be documented in the docstring somehow?

Sure, that's fair. It is at least documented in the StorageBackend.default_user docstring which it just forwards to:

class StorageBackend(abc.ABC):

    @property
    def default_user(self) -> Optional['User']:
        """Return the default user for the profile, if it has been created.

        This is cached, since it is a frequently used operation, for creating other entities.
        """

I am not sure this is the correct solution here, feels more like a hack. We really want the default user. What if there are two users with the same email (is that possible?).

That is what StorageBackend.default_user does though. The email is currently defined as unique on the database model. The User class is a bit of an exception in the ORM, as it is the only class to not have a UUID but it uses the email instead. See this issue https://github.com/aiidateam/aiida-core/issues/6174 for a discussion on pro's and cons. Looks like we are not going to change this for now

sphuber commented 5 months ago

In general, I think the issue we have in AWB is that we have a coupling between the code that handles the frontend events (i.e. user clicking a ipywidgets button), which generally come from different threads, and backend code that communicates with the DB. This coupling makes it hard to see where the footguns lie.

I guess that is indeed because you are directly using the Python API. If there were a full-fledged web API (e.g. REST API) that exposed all necessary functionality, then it might be a better solution to use that instead. There is the REST API implemented in aiida-core, but it is too limited. The aiida-restapi has made headway in improving it, but the main feature lacking is still to replace/represent the ORM API. Incidentally, I just opened an AEP that would solve this problem (at least to a large extent). I am planning on implementing a web API that exposed most of the important parts of the Python API.

danielhollas commented 5 months ago

That is what StorageBackend.default_user does though. The email is currently defined as unique on the database model. The User class is a bit of an exception in the ORM, as it is the only class to not have a UUID but it uses the email instead. See this issue https://github.com/aiidateam/aiida-core/issues/6174 for a discussion on pro's and cons. Looks like we are not going to change this for now

Oh, perfect, that's good to know. In that case I am happy with this PR as along as the other uses are fixed as well.

but the main feature lacking is still to replace/represent the ORM API. Incidentally, I just https://github.com/aiidateam/AEP/pull/40 that would solve this problem (at least to a large extent). I am planning on implementing a web API that exposed most of the important parts of the Python API.

That is very exciting, thanks!

unkcpz commented 5 months ago

Thanks for the discussion!

In general, I think the issue we have in AWB is that we have a coupling between the code that handles the frontend events (i.e. user clicking a ipywidgets button)

Yes, in the long term this is the way we should go. We did some test before with the workflow viewer and it is definitely doable. But it requires also some test on how AiiDAlab handle the restapi service etc. and effort of changing things from using aiida node directly to restapi call little by little for AWB and other apps.

Incidentally, I just aiidateam/AEP#40 that would solve this problem

Cool! Looking forward to it. In AiiDAlab, we can start test on moving to restapi in parallel.

unkcpz commented 5 months ago

@danielhollas the other two cases are also changed, thanks a lot for the careful review!