NeurodataWithoutBorders / nwb-guide

NWB GUIDE is a desktop app that provides a no-code user interface for converting neurophysiology data to NWB.
https://nwb-guide.readthedocs.io/
MIT License
22 stars 4 forks source link

[Feature]: File conversion progress bars #774

Closed CodyCBakerPhD closed 1 month ago

CodyCBakerPhD commented 2 months ago

What would you like to see added to the NWB GUIDE?

Next step after https://github.com/NeurodataWithoutBorders/nwb-guide/pull/676 is to do parallel file write and sub bars for the buffers of each file

Here is some basic code that should accomplish this (I'll have to open a small PR to HDMF and HDMF-Zarr to facilitate proper progress bar class specification)

    futures = []
    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        for session_to_nwb_kwargs in session_to_nwb_kwargs_per_session:

            # Errors in subprocesses will not propagate to top level stderr
            # So dump them to a file instead
            # Might want dedicated subfolder for this?
            exception_file_path = data_dir_path / f"ERROR_<nwbfile_name>.txt"

           # This is where we inject sub par progress info
           # This is passed via the iterator options of relevant interfaces through conversion options of the converter
           # For example
           # conversion_options = dict()
           # for interface_key, interface in data_formats_page.items():  
           #     if isinstance(interface, BaseRecordingInterface):
           #         conversion_options.update({interface_key: dict(iterator_opts=dict(display_progress=True, progress_bar_class=TQDMPublisher, progress_bar_options=dict(...)))})

            futures.append(
                executor.submit(
                    safe_session_to_nwb,
                    session_to_nwb_kwargs=session_to_nwb_kwargs,
                    exception_file_path=exception_file_path,
                )
            )
        for _ in TQDMPublisher(as_completed(futures), total=len(futures), ...):
            pass

where safe_session_to_nwb looks like this

def safe_session_to_nwb(*, session_to_nwb_kwargs: dict, exception_file_path: Union[Path, str]):
    """Convert a session to NWB while handling any errors by recording error messages to the exception_file_path.
    Parameters
    ----------
    session_to_nwb_kwargs : dict
        The arguments for session_to_nwb.
    exception_file_path : Path
        The path to the file where the exception messages will be saved.
    """
    exception_file_path = Path(exception_file_path)
    try:
        session_to_nwb(**session_to_nwb_kwargs)
    except Exception as e:
        with open(exception_file_path, mode="w") as f:
            f.write(f"session_to_nwb_kwargs: \n {pformat(session_to_nwb_kwargs)}\n\n")
            f.write(traceback.format_exc())

where session_to_nwb is whatever current function we currently use in the GUIDE to convert a single session to create a single file

Inspired by @pauladkisson's contribution on https://github.com/catalystneuro/cookiecutter-my-lab-to-nwb-template/pull/23, which is based on strategies we've used many times for past conversions

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

Yes

Did you confirm this feature was not already reported?

Yes

CodyCBakerPhD commented 2 months ago

This branch should work for HDF5 at least: https://github.com/hdmf-dev/hdmf/pull/1110

I don't think NeuroConv changes should be needed since they just dynamically pass everything down the chain

garrettmflynn commented 2 months ago

Just flagging that adding Paul's safe conversion strategy to https://github.com/NeurodataWithoutBorders/nwb-guide/pull/778 would change a good amount of our error handling for the GUIDE in general, and might be better in a separate Issue / PR

CodyCBakerPhD commented 2 months ago

We can probably just use our own existing endpoint for per-file conversions; the main goal with any kind of 'safe' way of doing that is just that the traceback error stack gets dumped to a persistent file on disk since stdout/stderr pipes are not easily accessible during multiprocessing

garrettmflynn commented 2 months ago

Just implemented a general log system in https://github.com/NeurodataWithoutBorders/nwb-guide/pull/778 with a specific endpoint for registering errors from parallel processes! Should give us something useful to work with generally for remote debugging :)

Here's an example file: 2024-05-15_16-03-22.log

As shown by the Exception on /neuroconv/convert line, this will automatically log errors from any failing endpoint in addition to the process-specific errors

CodyCBakerPhD commented 2 months ago

Just implemented a general log system in https://github.com/NeurodataWithoutBorders/nwb-guide/pull/778 with a specific endpoint for registering errors from parallel processes! Should give us something useful to work with generally for remote debugging :)

Very cool!

CodyCBakerPhD commented 1 month ago

Added in #778