BCDA-APS / gemviz

Data visualization for tiled
https://bcda-aps.github.io/gemviz/
Other
4 stars 0 forks source link

BUG: Could not page to end of big catalog #51

Closed prjemian closed 1 year ago

prjemian commented 1 year ago

Loaded a catalog with 88k runs. The first 20 runs displayed in the results table. After pressing the last pager button, the app crashed with this exception trace:

(bluesky_2023_2) jemian@otz ~/.../gemviz23/demo $ catalogSelected: args = ('developer',)  kwargs = {}
Displaying catalog: developer
catalogSelected: args = ('20idb_usaxs',)  kwargs = {}
Displaying catalog: 20idb_usaxs
action='next' kwargs={}
doPager action ='next', value =None
self.pageOffset() =20 self.pageSize() =20
action='last' kwargs={}
doPager action ='last', value =None
Traceback (most recent call last):
  File "/home/beams1/JEMIAN/Documents/projects/BCDA-APS/gemviz23/gemviz23/demo/resultwindow.py", line 186, in doPagerButtons
    model.doPager(action)
  File "/home/beams1/JEMIAN/Documents/projects/BCDA-APS/gemviz23/gemviz23/demo/resultwindow.py", line 83, in doPager
    self.setUidList(self._get_uidList())
  File "/home/beams1/JEMIAN/Documents/projects/BCDA-APS/gemviz23/gemviz23/demo/resultwindow.py", line 97, in _get_uidList
    return list(gen)
  File "/home/beams/JEMIAN/.conda/envs/bluesky_2023_2/lib/python3.10/site-packages/tiled/client/node.py", line 352, in _keys_slice
    content = self.context.get_json(
  File "/home/beams/JEMIAN/.conda/envs/bluesky_2023_2/lib/python3.10/site-packages/tiled/client/context.py", line 595, in get_json
    self.get_content(
  File "/home/beams/JEMIAN/.conda/envs/bluesky_2023_2/lib/python3.10/site-packages/tiled/client/context.py", line 557, in get_content
    handle_error(response)
  File "/home/beams/JEMIAN/.conda/envs/bluesky_2023_2/lib/python3.10/site-packages/tiled/client/utils.py", line 29, in handle_error
    response.raise_for_status()
  File "/home/beams/JEMIAN/.conda/envs/bluesky_2023_2/lib/python3.10/site-packages/httpx/_models.py", line 749, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://otz.xray.aps.anl.gov:8000/api/v1/node/search/20idb_usaxs?page%5Boffset%5D=88196&fields=&sort=time'
For more information check: https://httpstatuses.com/500
prjemian commented 1 year ago

The tiled server console output showed this error:

pymongo.errors.OperationFailure: Executor error during find command :: caused by :: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit., full error: {'ok': 0.0, 'errmsg': 'Executor error during find command :: caused by :: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.', 'code': 96, 'codeName': 'OperationFailed'}

The trace points to this code: https://github.com/BCDA-APS/tiled-viz2023/blob/99dd2c8ff18a441288ad4a37771d79a7a966d957/gemviz23/demo/resultwindow.py#L96-L97

prjemian commented 1 year ago

This is another aspect of #46. Closing since it is the same bug.

prjemian commented 1 year ago

Fails on other catalogs with fewer runs, as well.

Not sure if this error came from the tiled server or the MongoDB server.

On a different workstation, with more RAM, this process does not fail with a catalog of almost 9k runs.

prjemian commented 1 year ago

Perhaps could use cat.new_variation() instead?

prjemian commented 1 year ago

Tried slicing instead, since the new_variation() method seems to do something different. Same server error.

prjemian commented 1 year ago
  File "/home/beams/JEMIAN/.conda/envs/bluesky_2023_2/lib/python3.10/site-packages/tiled/client/node.py", line 352, in _keys_slice
    content = self.context.get_json(

The tiled code referenced here has been revised:

warnings.warn( """The module 'tiled.client.node' has been moved to 'tiled.client.container' and the object 'Node' has been renamed 'Container'.""", DeprecationWarning, )

Might be worthwhile to update our local tiled versions (both server and client) before proceeding with this issue.

prjemian commented 1 year ago

FYI, failing client has these versions:

(bluesky_2023_2) jemian@otz ~/.../gemviz23/demo $ conda list tiled
# packages in environment at /home/beams/JEMIAN/.conda/envs/bluesky_2023_2:
#
# Name                    Version                   Build  Channel
tiled                     0.1.0a91             hd8ed1ab_0    conda-forge
tiled-base                0.1.0a91           pyhd8ed1ab_0    conda-forge
tiled-client              0.1.0a91             hd8ed1ab_0    conda-forge
tiled-formats             0.1.0a91             hd8ed1ab_0    conda-forge
tiled-server              0.1.0a91             hd8ed1ab_0    conda-forge

failing server:

(tiled) jemian@otz ~/.../gemviz23/demo $ conda list tiled
# packages in environment at /home/beams/JEMIAN/.conda/envs/tiled:
#
# Name                    Version                   Build  Channel
tiled                     0.1.0a85                 pypi_0    pypi

Ok client has these versions:

(bluesky_2023_2) prjemian@arf:~/.../gemviz23/demo$ conda list tiled
# packages in environment at /home/prjemian/.conda/envs/bluesky_2023_2:
#
# Name                    Version                   Build  Channel
tiled                     0.1.0a94             hd8ed1ab_0    conda-forge
tiled-base                0.1.0a94           pyhd8ed1ab_0    conda-forge
tiled-client              0.1.0a94             hd8ed1ab_0    conda-forge
tiled-formats             0.1.0a94             hd8ed1ab_0    conda-forge
tiled-server              0.1.0a94             hd8ed1ab_0    conda-forge

Ok server has these versions:

(base) prjemian@zap:~$ conda list -n tiled tiled
# packages in environment at /home/prjemian/.conda/envs/tiled:
#
# Name                    Version                   Build  Channel
tiled                     0.1.0a85                 pypi_0    pypi
prjemian commented 1 year ago

While this may be a resource problem on the server (and may be affected by #53), what can we do as a client to avoid triggering this problem?

prjemian commented 1 year ago

We rely on random access to the UIDs in the table as a list:

(bluesky_2023_2) prjemian@arf:~/.../BCDA-APS/gemviz$ git grep uidList | grep -v _uidList
gemviz/resultwindow.py:        value = len(self.uidList())
gemviz/resultwindow.py:            uid = self.uidList()[index.row()]
gemviz/resultwindow.py:        return (self.pageOffset() + len(self.uidList())) >= self.catalog_length()
gemviz/resultwindow.py:    def uidList(self):
gemviz/resultwindow.py:            end = start + len(self.uidList())
gemviz/resultwindow.py:        uid = self.uidList()[index.row()]
gemviz/resultwindow.py:        uid = self.uidList()[index.row()]

It would be a challenge to reference the UIDs through an iterator.

prjemian commented 1 year ago

Still fails with changes in #53:

  File "/home/beams1/JEMIAN/Documents/projects/BCDA-APS/gemviz/gemviz/bluesky_runs_catalog_table_view.py", line 58, in doPagerButtons
    model.doPager(action)
  File "/home/beams1/JEMIAN/Documents/projects/BCDA-APS/gemviz/gemviz/bluesky_runs_catalog_table_model.py", line 110, in doPager
    self.setUidList(self._get_uidList())
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/beams1/JEMIAN/Documents/projects/BCDA-APS/gemviz/gemviz/bluesky_runs_catalog_table_model.py", line 129, in _get_uidList
    return list(gen)  # FIXME: fails here with big catalogs, see issue #51
           ^^^^^^^^^
  File "/home/beams/JEMIAN/.conda/envs/bluesky_2023_3/lib/python3.11/site-packages/tiled/client/container.py", line 349, in _keys_slice
    content = handle_error(
              ^^^^^^^^^^^^^
  File "/home/beams/JEMIAN/.conda/envs/bluesky_2023_3/lib/python3.11/site-packages/tiled/client/utils.py", line 18, in handle_error
    response.raise_for_status()
  File "/home/beams/JEMIAN/.conda/envs/bluesky_2023_3/lib/python3.11/site-packages/httpx/_models.py", line 749, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://localhost:8020/api/v1/search/9idc_usaxs_retired_2022-08-10?page%5Boffset%5D=17443&fields=&filter%5Bcomparison%5D%5Bcondition%5D%5Boperator%5D=ge&filter%5Bcomparison%5D%5Bcondition%5D%5Boperator%5D=le&filter%5Bcomparison%5D%5Bcondition%5D%5Bkey%5D=time&filter%5Bcomparison%5D%5Bcondition%5D%5Bkey%5D=time&filter%5Bcomparison%5D%5Bcondition%5D%5Bvalue%5D=1642140000.0&filter%5Bcomparison%5D%5Bcondition%5D%5Bvalue%5D=1667887199.999&sort=time'
For more information check: https://httpstatuses.com/500
prjemian commented 1 year ago

Instead of moving to the far end of the catalog, setting the pageSize is an alternative. Depending on the server, this might begin to be slow. A progressbar window with a [cancel] button should be posted if the list(gen) operation take too long. Also, should post notice if/when the operation fails (as originally noted).

prjemian commented 1 year ago

The tqdm package provides progress bars.

prjemian commented 1 year ago

Might be an algorithm design problem in gemviz. We might be trying to do this the hard way. Instead, there may be some catalog metadata that provides the catalog length and first+last runs info.

prjemian commented 1 year ago

Time to revisit the pager implementation. Can we pass this handling to the tiled server?

prjemian commented 1 year ago

tiled server shows this error:

pymongo.errors.OperationFailure: Executor error during find command :: caused by :: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.

prjemian commented 1 year ago

Client can't see this error, only "500 Internal server error". Client can advise user to adjust filters to reduce catalog size.

Why is the PyMongo seeing a maximum RAM of 33_554_432 bytes? Can this be increased? The tiled server has a larger OBJECT CACHE.