BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
208 stars 71 forks source link

Threading issues with ssCorrect.py: RuntimeError: cannot join thread before it is started #237

Closed chrisamiller closed 1 year ago

chrisamiller commented 1 year ago

I'm using the docker imagebrookslab/flair:1.7.0

After trying to run the whole flair correct pipeline and encountering an error, I isolated it to the ssCorrect step (full command was helpfully spit out in the error logs). When I try to run that step directly like this:

/usr/bin/python3 /usr/local/lib/python3.10/dist-packages/flair/ssCorrect.py -i RO-03958.bed -w 15 -p 4 -o RO-03958 --progress -f /storage1/fs1/bga/Active/gmsroot/gc2560/core/model_data/2887491634/build21f22873ebe0486c8e6f69c15435aa96/all_sequences.fa --correctStrand -g /storage1/fs1/bga/Active/gmsroot/gc2560/core/GRC-human-build38_human_95_38_U2AF1_fix/rna_seq_annotation/Homo_sapiens.GRCh38.95.gtf

Then I get errors about threading:

/usr/local/lib/python3.10/dist-packages/flair/ssCorrect.py:238: TqdmMonitorWarning: tqdm:disabling monitor support (monitor_interval = 0) due to:
can't start new thread
  for exonInfo in tqdm(txnList, total=len(txnList), desc="Step 1/5: Splitting junctions from GTF by chromosome", dynamic_ncols=True, position=1) if verbose else txnList:
Step 1/5: Splitting junctions from GTF by chromosome: 100%|██████████████████████████████████████████████████████████████████████████████████████| 206601/206601 [00:00<00:00, 230894.90it/s]
Step 3/5: Preparing annotated junctions to use for correction: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 47/47 [00:00<00:00, 75.47it/s]
Reference sequence chr1_KI270706v1_random not found in annotations, skipping███████████████████████████████▊                                                 | 21/47 [00:00<00:00, 39.06it/s]
Reference sequence chr1_KI270707v1_random not found in annotations, skipping
Reference sequence chr1_KI270708v1_random not found in annotations, skipping
Reference sequence chr1_KI270712v1_random not found in annotations, skipping
Reference sequence chr1_KI270714v1_random not found in annotations, skipping
. . . 
Reference sequence HLA-DQB1*03:01:01:01 not found in annotations, skipping
Reference sequence HLA-DQB1*03:02:01 not found in annotations, skipping
Reference sequence HLA-DRB1*04:03:01 not found in annotations, skipping
Step 4/5: Preparing reads for correction: 98422703it [02:00, 819648.31it/s]
Traceback (most recent call last):ection: 98378732it [02:00, 981246.98it/s]
  File "/usr/local/lib/python3.10/dist-packages/flair/ssCorrect.py", line 423, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/flair/ssCorrect.py", line 392, in main
    p = Pool(threads)
  File "/usr/lib/python3.10/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 232, in __init__
    self._worker_handler.start()
  File "/usr/lib/python3.10/threading.py", line 928, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
Exception ignored in atexit callback: <bound method TMonitor.exit of <TMonitor(Thread-1, initial daemon)>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 44, in exit
    self.join()
  File "/usr/lib/python3.10/threading.py", line 1084, in join
    raise RuntimeError("cannot join thread before it is started")
RuntimeError: cannot join thread before it is started

I also tried running single threaded with -p 1 and see the same result

Through a little debugging, I can isolate the command that fails to this "Pool(threads) call: https://github.com/BrooksLabUCSC/flair/blob/master/src/flair/ssCorrect.py#L392

But I'm not sure where to go from there. Any advice would be appreciated!

chrisamiller commented 1 year ago

Additional information:

I'd still welcome any advice you can give on what the underlying issue might be, but I at least have a short-term workaround here...

chrisamiller commented 1 year ago

After pulling my hair out for a while, it turns out this was a weird issue with my compute environment. Closing it up.

ghofrankothoum commented 1 year ago

Hello I know this is closed but I am having the same error on the same environment. Did you find any solution for running correct on the compute environment?

chrisamiller commented 1 year ago

We ended up building our own docker image locally - I shot you a DM on the wustl slack that may help