esi-neuroscience / acme

Asynchronous Computing Made ESI
https://esi-acme.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
11 stars 2 forks source link

HDF can't handle sparse matricies #25

Closed KatharineShapcott closed 3 years ago

KatharineShapcott commented 3 years ago

Hi Stefan, Since yesterday we were talking about jobs crashing after a long runtime I thought I should report this. I didn't even realise that some classifiers output sparse predictions and others don't! Unfortunately I got a very unhelpful error message about this so it took me some time to figure out:

distributed.worker - WARNING -  Compute Failed
Function:  execute_task
args:      ((<function reify at 0x7f119c47e790>, (<function map_chunk at 0x7f119c47eb80>, <function ACMEdaemon.func_wrapper at 0x7f1197c02160>, [[KNeighborsClassifier(p=1, weights='distance')], [1], ['reuters'], ['/mnt/hpx/home/shapcottk/ACME_20210315-113132-926367'], ['comparison_classifier_39.h5'], [39], [<function comparison_classifier at 0x7f119cf1faf0>]], ['train_size', 'dataset', 'outDir', 'outFile', 'taskID', 'userFunc'], {})))
kwargs:    {}
Exception: TypeError("Object dtype dtype('O') has no native HDF5 equivalent")

slurmstepd: error: *** JOB 3446409 ON esi-svhpc23 CANCELLED AT 2021-03-15T13:27:14 **

In my case 'issparse' isn't even enough to catch it because sometimes they're sparse within a list! Of course this doesn't happen without writing to disk so my code ran fine whenever I tested it without acme. Maybe there could be a more informative error message if this is happening? Or we could return that part of the data so the user can see what's causing the crash?

Thanks! Katharine

PS I'm happy to update the readme with more details about what can and can't be used as an output.

pantaray commented 3 years ago

Hey Katharine! That is indeed frustrating and extremely difficult to debug. I agree, this should definitely be remedied somehow. First step is catching the failed HDF5 write and logging the error, so the user immediately sees the problem. So, I guess we might want to try/except saving the results in HDF5, if that does not work, log the error, then attempt to pickle it, if that does not work either return it. Would that make sense?

PS I'm happy to update the readme with more details about what can and can't be used as an output.

That would be very much appreciated. Thank you!

KatharineShapcott commented 3 years ago

Try/except with a better error message sounds like a great idea, maybe you can also return which of the results (if it's a list or tuple) the crash occurred on? I'm not sure how the user will handle occasional pickling because they'd have to load in some data differently than others. I think for usability a flag should be able to be set by the user to do either all hdf5 or all pickle. Then if I know my outputs don't work with hdf5 I can switch my code to pickle. But an emergency pickle would really help with debugging a weird problem like this.

pantaray commented 3 years ago

Try/except with a better error message sounds like a great idea, maybe you can also return which of the results (if it's a list or tuple) the crash occurred on?

Yes, absolutely, good point!

I'm not sure how the user will handle occasional pickling because they'd have to load in some data differently than others. I think for usability a flag should be able to be set by the user to do either all hdf5 or all pickle. Then if I know my outputs don't work with hdf5 I can switch my code to pickle. But an emergency pickle would really help with debugging a weird problem like this.

Hm, yes, that's right. Having (potentially) hundreds of HDF5 files with the occasional pickle-dump mixed in between does not sound too pleasant from a data collecting perspective... I think the flag is a great idea! But then I'm wondering if an emergency in-memory return might be better/easier to understand than an unsolicited pickle-dump. What do you think?

KatharineShapcott commented 3 years ago

Hm, yes, that's right. Having (potentially) hundreds of HDF5 files with the occasional pickle-dump mixed in between does not sound too pleasant from a data collecting perspective... I think the flag is a great idea! But then I'm wondering if an emergency in-memory return might be better/easier to understand than an unsolicited pickle-dump. What do you think?

Tried that, if the data is too big then the job never returns properly which is also hard to debug. "Too big" is also hard to define, seems to be dependent on how busy the cluster is! I think a pickle is safer in this case. Also pickle might give a nicer error message if there's still a problem there?

pantaray commented 3 years ago

Okay, good point - pickle it is then! If the output is large, returning things always runs the risk of killing the parent session anyway. I guess we have a plan, then!

"Too big" is also hard to define, seems to be dependent on how busy the cluster is!

I'm still banging my head against the desk with these "too big" dask bags (cf. #23). I thought I had a solution yesterday but it only worked for one array in the input spec of the user function. Which would be fine, but I don't know why it works for one array but not for more. From what I'm seeing, dask starts expanding the constructed bags before calling the workers flooding the caller's memory, which in turn causes SLURM to just unceremonially kill the parent before the workers even got any data.

pantaray commented 3 years ago

Hi Katharine!

I just pushed the (hopefully) final commit to the pickle_save branch (d524867). It includes the discussed emergency pickling mechanic + a new write_pickle keyword. In addition, I included a custom exception handler that should catch CTRL + C keyboard interrupts and perform a graceful shutdown of any running dask client + workers to avoid detaching SLURM jobs from the managing computing client (cf #23). Please feel free to test-drive the changes whenever you have time, then I'll merge into main.

Thank you again for all your help!

KatharineShapcott commented 3 years ago

Just tested the keyboard interrupt thing, doesn't seem to work from jupyter at least. Jobs are still running 2 mins later.

#%% try multilabel 200 times...
<esi_cluster_setup> SLURM workers ready: 0/35   [elapsed time 00:00 | timeout at 01:00]<esi_cluster_setup> Requested job-count 50 exceeds `n_jobs_startup`: waiting for 35 jobs to come online, then proceed
<esi_cluster_setup> SLURM workers ready: 50/None    [elapsed time 00:07 | timeout at 01:00]
<ParallelMap> INFO: Attaching to global parallel computing client <Client: 'tcp://              .4:33113' processes=50 threads=50, memory=400.00 GB>
<ParallelMap> INFO: Preparing 200 parallel calls of `comparison_classifier` using 50 workers
<ParallelMap> INFO: Log information available at /mnt/hpx/slurm/shapcottk/shapcottk_20210421-110115
<esi_cluster_setup> Cluster dashboard accessible at http://               .4:8787/status
  0% |          | 0/200 [00:00<?]
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
~/python/filter_net_paper/scripts/filter_net_figure7.py in <module>
     11 with ParallelMap(comparison_classifier, clf, dataset=dataset, train_size=train_sizes, 
     12                     n_inputs=n_trys, write_worker_results=write) as pmap:
---> 13     results = pmap.compute()

/mnt/pns/home/shapcottk/python/filter_net_paper/scripts/acme/backend.py in compute(self, debug)
    363         cnt = 0
    364         while any(f.status == "pending" for f in futures):
--> 365             time.sleep(self.sleepTime)
    366             new = max(0, sum([f.status == "finished" for f in futures]) - cnt)
    367             cnt += new

KeyboardInterrupt: 
KatharineShapcott commented 3 years ago

Hmm I tried to use a classifier that I know returns sparse data and got some really odd errors. Looks like sparse data can't be pickled either. Shall I try and make a minimum working example or is this enough for you?

slurm-3982317.txt

...
distributed.scheduler - INFO - Remove worker <Worker 'tcp://               :43993', name: 24, memory: 0, processing: 145>
distributed.scheduler - INFO - Lost all workers
distributed.scheduler - INFO - Register worker <Worker 'tcp://                :46241', name: 14, memory: 0, processing: 7>
distributed.scheduler - INFO - Starting worker compute stream, tcp://               :46241
distributed.scheduler - INFO - Register worker <Worker 'tcp://              :33629', name: 31, memory: 0, processing: 1>
distributed.scheduler - INFO - Starting worker compute stream, tcp://            :33629
...
distributed.scheduler - INFO - Unexpected worker completed task, likely due to work stealing.  Expected: <Worker 'tcp://  
         :37137', name: 23, memory: 0, processing: 1>, Got: <Worker 'tcp://             :39263', name: 8, memory: 0, processing: 0>, Key: ('from_sequence-b24b1b22c80e07407f4b29e957d0a408', 0)
...
distributed.scheduler - INFO - Register worker <Worker 'tcp://                :36881', name: 0, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://             .26:36881
distributed.scheduler - INFO - Unexpected worker completed task, likely due to work stealing.  Expected: <Worker 'tcp:// 
              .22:37137', name: 23, memory: 0, processing: 1>, Got: <Worker 'tcp://            .27:46241', name: 14, memory: 0, processing: 0>, Key: ('from_sequence-b24b1b22c80e07407f4b29e957d0a408', 0)
...
Please consult the following SLURM log files for details:
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982346.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982321.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982324.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982316.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982330.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982358.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982314.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982320.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982328.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982333.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982338.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982356.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982312.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982318.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982326.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982331.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982354.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982315.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982325.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982329.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982344.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982313.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982319.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982327.out
/mnt/hpx/slurm/shapcottk/shapcottk_20210421-111042/slurm-3982317.out

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/mnt/pns/home/shapcottk/python/filter_net_paper/scripts/acme/shared.py in ctrlc_catcher(*excargs, **exckwargs)
    338 
    339     import IPython
--> 340     IPython.core.interactiveshell.InteractiveShell.showtraceback(*excargs)
    341 
    342     # Relay exception handling back to system tools

~/.conda/envs/acme/lib/python3.8/site-packages/IPython/core/interactiveshell.py in showtraceback(self, exc_tuple, filename, tb_offset, exception_only, running_compiled_code)
   2021         try:
   2022             try:
-> 2023                 etype, value, tb = self._get_exc_info(exc_tuple)
   2024             except ValueError:
   2025                 print('No traceback available to show.', file=sys.stderr)

~/.conda/envs/acme/lib/python3.8/site-packages/IPython/core/interactiveshell.py in _get_exc_info(self, exc_tuple)
   1969             etype, value, tb = sys.exc_info()
   1970         else:
-> 1971             etype, value, tb = exc_tuple
   1972 
   1973         if etype is None:

TypeError: cannot unpack non-iterable type object
The original exception:
pantaray commented 3 years ago

Hi! Hm, that's the (apparently dysfunctional) exception handler crashing. However, the line IPython.core.interactiveshell.InteractiveShell.showtraceback(*excargs) should not be there, that was a WIP commit. Could you do a git pull in the pickle_save branch and try again?

pantaray commented 3 years ago

Hi!

Just tested the CTRL + C catcher with the latest version of the pickle_save branch - it does what it's supposed to do in my notebook:

Screenshot from 2021-04-21 12-33-48

Here the code:

# Add acme to Python search path
import os
import sys
acme_path = os.path.abspath(".." + os.sep + "..")
if acme_path not in sys.path:
    sys.path.insert(0, acme_path)

from acme import ParallelMap
import time
import dask.distributed as dd
def long_running(dummy):
    time.sleep(30)
    return

with ParallelMap(long_running, [None]*10, setup_interactive=False, write_worker_results=False) as pmap:
    pmap.compute()
KatharineShapcott commented 3 years ago

Hm, that's the (apparently dysfunctional) exception handler crashing. However, the line IPython.core.interactiveshell.InteractiveShell.showtraceback(*excargs) should not be there, that was a WIP commit. Could you do a git pull in the pickle_save branch and try again?

All seems fine now, this doesn't crash anymore and everything returns.

KatharineShapcott commented 3 years ago

Thanks for the test code, that also works for me. It seems to be because I'm using my own client. This reproduces the issue:

from acme import ParallelMap, esi_cluster_setup
import time

def long_running(dummy):
    time.sleep(30)
    return

n_jobs = 10
client = esi_cluster_setup(partition="8GBXS",n_jobs=n_jobs)
with ParallelMap(long_running, [None]*n_jobs, setup_interactive=False, write_worker_results=False) as pmap:
    pmap.compute()
pantaray commented 3 years ago

Ha, interesting. Thank you for the example! Same for me both locally and on the cluster. I'll look into this.

pantaray commented 3 years ago

Hi Katharine!

I think I've figured it out (finally). Following the official iPython/Jupyter guidelines for creating custom exception handlers (via get_ipython().set_custom_exc((Exception,), custom_exc)) did not work, hacking it apparently did ;) I've just pushed the changes to the pickle_save branch (a git pull should bring you up to speed). I've tried the code in Python, iPython and Jupyter - whenever you have time, please feel free to test it again. Thank you!

KatharineShapcott commented 3 years ago

Hi Stefan, Jupyter is so much fun ;) Whatever you did it seems to work now! Thanks so much for fixing that. Best, Katharine

KatharineShapcott commented 3 years ago

Btw when you ctrl c and then try and rerun your code without restarting the kernel, the acme print out is broken. Doesn't happen when it completes successfully. Not very important but maybe an easy fix?

<ParallelMap> INFO: <esi_cluster_setup> Requested job-count 50 exceeds `n_jobs_startup`: waiting for 10 jobs to come online, then proceed

<esi_cluster_setup> SLURM workers ready: 0/10   [elapsed time 00:00 | timeout at 01:00]A
<esi_cluster_setup> SLURM workers ready: 0/10   [elapsed time 00:01 | timeout at 01:00]A
<esi_cluster_setup> SLURM workers ready: 0/10   [elapsed time 00:01 | timeout at 01:00]A
<esi_cluster_setup> SLURM workers ready: 0/10   [elapsed time 00:02 | timeout at 01:00]A
<esi_cluster_setup> SLURM workers ready: 0/10   [elapsed time 00:02 | timeout at 01:00]A
<esi_cluster_setup> SLURM workers ready: 0/10   [elapsed time 00:03 | timeout at 01:00]A
<esi_cluster_setup> SLURM workers ready: 0/10   [elapsed time 00:03 | timeout at 01:00]A
<esi_cluster_setup> SLURM workers ready: 5/10   [elapsed time 00:04 | timeout at 01:00]A
<esi_cluster_setup> SLURM workers ready: 5/10   [elapsed time 00:04 | timeout at 01:00]A
<esi_cluster_setup> SLURM workers ready: 18/None    [elapsed time 00:05 | timeout at 01:00]A
<esi_cluster_setup> SLURM workers ready: 18/None    [elapsed time 00:05 | timeout at 01:00]
<ParallelMap> INFO: <esi_cluster_setup> Cluster dashboard accessible at http://           :8787/status
<ParallelMap> INFO: Attaching to global parallel computing client <Client: 'tcp://            :38687' processes=18 threads=18, memory=144.00 GB>
<ParallelMap> INFO: Preparing 50 parallel calls of `comparison_classifier` using 50 workers
<ParallelMap> INFO: Log information available at /mnt/hpx/slurm/shapcottk/shapcottk_20210428-094258

  0% |          | 0/50 [00:00<?]A
  0% |          | 0/50 [00:00<?]A
  0% |          | 0/50 [00:00<?]A
  0% |          | 0/50 [00:00<?]A
  0% |          | 0/50 [00:00<?]A
  0% |          | 0/50 [00:00<?]A
  0% |          | 0/50 [00:00<?]A
  0% |          | 0/50 [00:00<?]A
  0% |          | 0/50 [00:00<?]A
  0% |          | 0/50 [00:00<?]A
  0% |          | 0/50 [00:01<?]A
  0% |          | 0/50 [00:01<?]A
  0% |          | 0/50 [00:01<?]A
  0% |          | 0/50 [00:01<?]A
  0% |          | 0/50 [00:01<?]A
  0% |          | 0/50 [00:01<?]A
  0% |          | 0/50 [00:01<?]A
  0% |          | 0/50 [00:01<?]A
  0% |          | 0/50 [00:01<?]A
  0% |          | 0/50 [00:01<?]A
  0% |          | 0/50 [00:02<?]A
  0% |          | 0/50 [00:02<?]A
  0% |          | 0/50 [00:02<?]A
  0% |          | 0/50 [00:02<?]A
  0% |          | 0/50 [00:02<?]A
  0% |          | 0/50 [00:02<?]A
  0% |          | 0/50 [00:02<?]A
  0% |          | 0/50 [00:02<?]A
  0% |          | 0/50 [00:02<?]A
  0% |          | 0/50 [00:02<?]A
  0% |          | 0/50 [00:03<?]A
  0% |          | 0/50 [00:03<?]A
  0% |          | 0/50 [00:03<?]A
  0% |          | 0/50 [00:03<?]A
  0% |          | 0/50 [00:03<?]A
  0% |          | 0/50 [00:03<?]A
  0% |          | 0/50 [00:03<?]A
  0% |          | 0/50 [00:03<?]A
  0% |          | 0/50 [00:03<?]A
  0% |          | 0/50 [00:03<?]A
  0% |          | 0/50 [00:04<?]A
  0% |          | 0/50 [00:04<?]A
  0% |          | 0/50 [00:04<?]A
  0% |          | 0/50 [00:04<?]A
  0% |          | 0/50 [00:04<?]A
  0% |          | 0/50 [00:04<?]A
  0% |          | 0/50 [00:04<?]A
  0% |          | 0/50 [00:04<?]A
  0% |          | 0/50 [00:04<?]A
  0% |          | 0/50 [00:04<?]A
  0% |          | 0/50 [00:05<?]A
  0% |          | 0/50 [00:05<?]A
  0% |          | 0/50 [00:05<?]A
  0% |          | 0/50 [00:05<?]A
  0% |          | 0/50 [00:05<?]A
  0% |          | 0/50 [00:05<?]A
  0% |          | 0/50 [00:05<?]A
  0% |          | 0/50 [00:05<?]A
  0% |          | 0/50 [00:05<?]A
  0% |          | 0/50 [00:06<?]A
  0% |          | 0/50 [00:06<?]A
  0% |          | 0/50 [00:06<?]A
  0% |          | 0/50 [00:06<?]A
  0% |          | 0/50 [00:06<?]A
  0% |          | 0/50 [00:06<?]A
  0% |          | 0/50 [00:06<?]A
  0% |          | 0/50 [00:06<?]A
  0% |          | 0/50 [00:06<?]A
  0% |          | 0/50 [00:06<?]A
  0% |          | 0/50 [00:07<?]A
  0% |          | 0/50 [00:07<?]A
  0% |          | 0/50 [00:07<?]A
  0% |          | 0/50 [00:07<?]A
  0% |          | 0/50 [00:07<?]A
  0% |          | 0/50 [00:07<?]A
  0% |          | 0/50 [00:07<?]A
  0% |          | 0/50 [00:07<?]A
  0% |          | 0/50 [00:07<?]A
  0% |          | 0/50 [00:07<?]A
  0% |          | 0/50 [00:08<?]A
  0% |          | 0/50 [00:08<?]A
  0% |          | 0/50 [00:08<?]A
  0% |          | 0/50 [00:08<?]A
  0% |          | 0/50 [00:08<?]A
  0% |          | 0/50 [00:08<?]A
  0% |          | 0/50 [00:08<?]A
  0% |          | 0/50 [00:08<?]A
  0% |          | 0/50 [00:08<?]A
  0% |          | 0/50 [00:08<?]A
  0% |          | 0/50 [00:09<?]A
  0% |          | 0/50 [00:09<?]A
  0% |          | 0/50 [00:09<?]A
  0% |          | 0/50 [00:09<?]A
  0% |          | 0/50 [00:09<?]A
  0% |          | 0/50 [00:09<?]A
  0% |          | 0/50 [00:09<?]A
  0% |          | 0/50 [00:09<?]A
  0% |          | 0/50 [00:09<?]A
  0% |          | 0/50 [00:09<?]A
  0% |          | 0/50 [00:10<?]A
  0% |          | 0/50 [00:10<?]A
  0% |          | 0/50 [00:10<?]A
  0% |          | 0/50 [00:10<?]A
  0% |          | 0/50 [00:10<?]A
  0% |          | 0/50 [00:10<?]A
  0% |          | 0/50 [00:10<?]A
  0% |          | 0/50 [00:10<?]A
  0% |          | 0/50 [00:10<?]A
  0% |          | 0/50 [00:11<?]A
  0% |          | 0/50 [00:11<?]A
  0% |          | 0/50 [00:11<?]A
  0% |          | 0/50 [00:11<?]A
  0% |          | 0/50 [00:11<?]A
  0% |          | 0/50 [00:11<?]A
  0% |          | 0/50 [00:11<?]A
  0% |          | 0/50 [00:11<?]A
  0% |          | 0/50 [00:11<?]A
  0% |          | 0/50 [00:11<?]A
  0% |          | 0/50 [00:12<?]A
  0% |          | 0/50 [00:12<?]A
  0% |          | 0/50 [00:12<?]A
  0% |          | 0/50 [00:12<?]A
  0% |          | 0/50 [00:12<?]A
  0% |          | 0/50 [00:12<?]A
  0% |          | 0/50 [00:12<?]A
  0% |          | 0/50 [00:12<?]A
  0% |          | 0/50 [00:12<?]A
  0% |          | 0/50 [00:12<?]A
  0% |          | 0/50 [00:13<?]A
  0% |          | 0/50 [00:13<?]A
  0% |          | 0/50 [00:13<?]A
  0% |          | 0/50 [00:13<?]A
  0% |          | 0/50 [00:13<?]A
  0% |          | 0/50 [00:13<?]A
  0% |          | 0/50 [00:13<?]A
  0% |          | 0/50 [00:13<?]A
  0% |          | 0/50 [00:13<?]A
  0% |          | 0/50 [00:13<?]A
  0% |          | 0/50 [00:14<?]A
  0% |          | 0/50 [00:14<?]A
  0% |          | 0/50 [00:14<?]A
  0% |          | 0/50 [00:14<?]A
  0% |          | 0/50 [00:14<?]A
  0% |          | 0/50 [00:14<?]A
  0% |          | 0/50 [00:14<?]A
  0% |          | 0/50 [00:14<?]A
  0% |          | 0/50 [00:14<?]A
  0% |          | 0/50 [00:14<?]A
  0% |          | 0/50 [00:15<?]A
  0% |          | 0/50 [00:15<?]A
  0% |          | 0/50 [00:15<?]A
  0% |          | 0/50 [00:15<?]A
  0% |          | 0/50 [00:15<?]A
  0% |          | 0/50 [00:15<?]A
  0% |          | 0/50 [00:15<?]A
  0% |          | 0/50 [00:15<?]A
  0% |          | 0/50 [00:15<?]A
  0% |          | 0/50 [00:15<?]A
  0% |          | 0/50 [00:16<?]A
  0% |          | 0/50 [00:16<?]A
  0% |          | 0/50 [00:16<?]A
  0% |          | 0/50 [00:16<?]A
  0% |          | 0/50 [00:16<?]A
  0% |          | 0/50 [00:16<?]A
  0% |          | 0/50 [00:16<?]A
  0% |          | 0/50 [00:16<?]A
  0% |          | 0/50 [00:16<?]A
  0% |          | 0/50 [00:16<?]A
  0% |          | 0/50 [00:17<?]A
  0% |          | 0/50 [00:17<?]A
  0% |          | 0/50 [00:17<?]A
  0% |          | 0/50 [00:17<?]A
  0% |          | 0/50 [00:17<?]A
  0% |          | 0/50 [00:17<?]A
  0% |          | 0/50 [00:17<?]A
  0% |          | 0/50 [00:17<?]A
  0% |          | 0/50 [00:17<?]A
  0% |          | 0/50 [00:18<?]A
  0% |          | 0/50 [00:18<?]A
  0% |          | 0/50 [00:18<?]A
  0% |          | 0/50 [00:18<?]A
  0% |          | 0/50 [00:18<?]A
  0% |          | 0/50 [00:18<?]A
  0% |          | 0/50 [00:18<?]A
  0% |          | 0/50 [00:18<?]A
  0% |          | 0/50 [00:18<?]A
  0% |          | 0/50 [00:18<?]A
  0% |          | 0/50 [00:19<?]A
  0% |          | 0/50 [00:19<?]A
  0% |          | 0/50 [00:19<?]A
  2% |▏         | 1/50 [00:19<00:04]A
  4% |▍         | 2/50 [00:25<01:34]A
  6% |▌         | 3/50 [00:26<01:10]A
 10% |█         | 5/50 [00:26<00:50]A
 12% |█▏        | 6/50 [00:26<00:35]A
 14% |█▍        | 7/50 [00:27<00:40]A
 16% |█▌        | 8/50 [00:27<00:28]A
 18% |█▊        | 9/50 [00:28<00:27]A
 20% |██        | 10/50 [00:28<00:19]A
 22% |██▏       | 11/50 [00:29<00:18]A
 24% |██▍       | 12/50 [00:29<00:13]A
 26% |██▌       | 13/50 [00:29<00:18]A
 32% |███▏      | 16/50 [00:30<00:12]A
 38% |███▊      | 19/50 [00:30<00:08]A
 42% |████▏     | 21/50 [00:30<00:05]A
 46% |████▌     | 23/50 [00:30<00:04]A
 50% |█████     | 25/50 [00:31<00:05]A
 54% |█████▍    | 27/50 [00:31<00:04]A
 56% |█████▌    | 28/50 [00:31<00:03]A
 58% |█████▊    | 29/50 [00:31<00:03]A
 60% |██████    | 30/50 [00:31<00:03]A
 62% |██████▏   | 31/50 [00:32<00:02]A
 64% |██████▍   | 32/50 [00:32<00:02]A
 66% |██████▌   | 33/50 [00:32<00:02]A
 72% |███████▏  | 36/50 [00:32<00:01]A
 76% |███████▌  | 38/50 [00:32<00:01]A
 78% |███████▊  | 39/50 [00:32<00:01]A
 80% |████████  | 40/50 [00:33<00:02]A
 82% |████████▏ | 41/50 [00:33<00:01]A
 84% |████████▍ | 42/50 [00:33<00:01]A
 86% |████████▌ | 43/50 [00:33<00:01]A
 90% |█████████ | 45/50 [00:33<00:00]A
 94% |█████████▍| 47/50 [00:34<00:00]A
 96% |█████████▌| 48/50 [00:34<00:00]A
 98% |█████████▊| 49/50 [00:34<00:00]A
100% |██████████| 50/50 [00:35<00:00]
<ParallelMap> INFO: SUCCESS! Finished parallel computation. Results have been saved to /mnt/hpx/home/shapcottk/ACME_20210428-094303-486253
pantaray commented 3 years ago

Hey Katharine! This looks "interesting"... I just pushed a new commit to the pickle_save branch in order to try to force tqdm to stay on the line it was initially printing to. Works on my machine - please feel free to try it whenever you have time :)

KatharineShapcott commented 3 years ago

Nice! Seems to work now, thanks!

pantaray commented 3 years ago

Cool - thanks for the quick test-drive! Will merge into main then :)