alan-turing-institute / empiarreader

Reader for EMPIAR datasets
BSD 3-Clause "New" or "Revised" License
9 stars 4 forks source link

Notebook example #48

Closed mooniean closed 1 year ago

mooniean commented 1 year ago

Added example notebook showcasing both EmpiarCatalog and EmpiarSource with explanations for users to try out.

Closes #15.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

JatGreer commented 1 year ago

Having some trouble running the notebook (tested in vscode w/ijupyter on a home internet connection). There's a couple of things which I think could be added - let me know what you think:

  1. It might be best to define what a 'dataset' is at the top of the notebook (so it's clear to the reader this notebook is about image data).
  2. It may be better to choose a different example EMPIAR dataset: this dataset describes images as being TIFF files in the xml, but we are only grabbing MRC files. A novice user will probably be confused when they cannot then open tif files via empiarreader
  3. It may be useful to show how to use the dataset which has been selected via the use of the EmpiarCatalogue before then introducing EmpiarSource as a means to grab dataset(s) not labelled in the xml file
  4. I get the following error (after 5 mins of waiting) when ds.read_paritition(10) is called. Is it just because my internet connection isn't fast enough? Is it trying to load the entire dataset or just 1 image?:
    
    ---------------------------------------------------------------------------
    RemoteDisconnected                        Traceback (most recent call last)
    File [~/miniconda3/envs/empiarreader/lib/python3.10/site-packages/urllib3/connectionpool.py:703](https://file+.vscode-resource.vscode-cdn.net/home/jg/Software/empiarreader/examples/~/miniconda3/envs/empiarreader/lib/python3.10/site-packages/urllib3/connectionpool.py:703), in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    702 # Make the request on the httplib connection object.
    --> 703 httplib_response = self._make_request(
    704     conn,
    705     method,
    706     url,
    707     timeout=timeout_obj,
    708     body=body,
    709     headers=headers,
    710     chunked=chunked,
    711 )
    713 # If we're going to release the connection in ``finally:``, then
    714 # the response doesn't need to know about the connection. Otherwise
    715 # it will also try to release it and we'll have a double-release
    716 # mess.

File ~/miniconda3/envs/empiarreader/lib/python3.10/site-packages/urllib3/connectionpool.py:449, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 445 except BaseException as e: 446 # Remove the TypeError from the exception chain in 447 # Python 3 (including for exceptions like SystemExit). 448 # Otherwise it looks like a bug in the code. --> 449 six.raise_from(e, None) 450 except (SocketTimeout, BaseSSLError, SocketError) as e: ... 549 except MaxRetryError as e: 550 if isinstance(e.reason, ConnectTimeoutError): 551 # TODO: Remove this in 3.0.0: see #2811

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...


4. Do we also want to show how to grab metadata .star file(s) and use to do something on an image (for example, plot picked particles) 
JatGreer commented 1 year ago

Looks good! Runs great for me and passes all tests. Going ahead and merging it