BrokenProcessPool error when mapping single cell data

Qirongmao97 commented 4 months ago

Hi, When I ran IsoQuant, the following error showed:

2024-03-11 03:24:09,459 - INFO - Loading read assignments from /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.save_chrX
2024-03-11 03:24:48,988 - INFO - Loading read assignments from /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.save_chr4
2024-03-11 03:24:53,775 - INFO - Loading read assignments from /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.save_chr5
2024-03-11 03:25:09,036 - INFO - Loading read assignments from /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.save_chr6
2024-03-11 03:25:55,905 - INFO - Loading read assignments from /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.save_chr7
2024-03-11 03:26:18,707 - INFO - Loading read assignments from /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.save_chr2
2024-03-11 03:26:50,016 - INFO - Loading read assignments from /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.save_chr1
2024-03-11 08:15:38,546 - CRITICAL - IsoQuant failed with the following error, please, submit this issue to https://github.com/ablab/IsoQuant/issuesTraceback (most recent call last):
  File "/exports/archive/hg-exon-skip/Qirong/conda/envs/isoquant/bin/isoquant.py", line 698, in <module>
    main(sys.argv[1:])
  File "/exports/archive/hg-exon-skip/Qirong/conda/envs/isoquant/bin/isoquant.py", line 692, in main
    run_pipeline(args)
  File "/exports/archive/hg-exon-skip/Qirong/conda/envs/isoquant/bin/isoquant.py", line 645, in run_pipeline
    dataset_processor.process_all_samples(args.input_data)
  File "/exports/archive/hg-exon-skip/Qirong/conda/envs/isoquant/share/isoquant-3.3.1-0/src/dataset_processor.py", line 368, in process_all_samples
    self.process_sample(sample)
  File "/exports/archive/hg-exon-skip/Qirong/conda/envs/isoquant/share/isoquant-3.3.1-0/src/dataset_processor.py", line 406, in process_sample
    self.process_assigned_reads(sample, saves_file)
  File "/exports/archive/hg-exon-skip/Qirong/conda/envs/isoquant/share/isoquant-3.3.1-0/src/dataset_processor.py", line 539, in process_assigned_reads
    for read_stat_counter, tsc in results:
  File "/exports/archive/hg-exon-skip/Qirong/conda/envs/isoquant/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/exports/archive/hg-exon-skip/Qirong/conda/envs/isoquant/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/exports/archive/hg-exon-skip/Qirong/conda/envs/isoquant/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/exports/archive/hg-exon-skip/Qirong/conda/envs/isoquant/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

I think this is related to the memory problem? I tried to lower the threads from 16 to 2, also uprised the RAM limit to 300 GB, still had the same error, I also tried the --low_memory option but it still didn't work, do you have any ideas how to solve this problem? Thanks!

andrewprzh commented 4 months ago

Dear @Qirongmao97

Yes, looks like a memory or I/O problem, at least it's not really possible to understand from the stacktrace. Could you try running in a single thread? I can also share you a script that measures CPU load / RAM usage in real time, so it would be possible to check RAM consumption before the crash.

Also, --low_memory is now a default behavior.

Best Andrey

Qirongmao97 commented 4 months ago

Dear @Qirongmao97

Yes, looks like a memory or I/O problem, at least it's not really possible to understand from the stacktrace. Could you try running in a single thread? I can also share you a script that measures CPU load / RAM usage in real time, so it would be possible to check RAM consumption before the crash.

Also, --low_memory is now a default behavior.

Best Andrey

Hey Andrey, thanks for the quick response!

Would be great if you could share the script to my email: q.mao@lumc.nl

Much appreciated!

Best regards, Qirong

Qirongmao97 commented 3 months ago

Hi!

I tried running Isoquant again with only single thread, monitored by the script you send to me:

./performance_counter.py --cmd "isoquant.py --output /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype 
--reference /exports/archive/hg-exon-skip/Qirong/refdata-gex-mm10-2020-A/fasta/genome.fa 
--genedb /exports/archive/hg-exon-skip/Qirong/gencode.vM23.primary_assembly.annotation.gtf
--bam /exports/archive/hg-exon-skip/Qirong/minimap2_output/wildtype/sorted.alignment.bam 
--data_type nanopore --sqanti_output 
--read_group file:/exports/archive/hg-exon-skip/Qirong/BLAZE_output/wildtype/putative_bc.csv:0:1:, 
--clean_start -t 1" --output /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/

But I still got the same warning because of the memory issue:

2024-03-13 22:19:07,422 - WARNING - Malformed input read information table, minimum, of 1 columns expected, file /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.read_group_JH584295.1, line: 2f46a094-1b5b-4c98-8c15-7b22d2629869
2024-03-13 22:19:07,422 - WARNING - Malformed input read information table, minimum, of 1 columns expected, file /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.read_group_JH584295.1, line: 5f9a6cbf-3dee-4d88-877a-dc41782f7f35
2024-03-13 22:19:07,422 - WARNING - Malformed input read information table, minimum, of 1 columns expected, file /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.read_group_JH584295.1, line: f0229bf4-adfe-4769-9731-58b7cc4e169d
2024-03-13 22:19:07,422 - WARNING - Malformed input read information table, minimum, of 1 columns expected, file /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.read_group_JH584295.1, line: f17e8d3f-ccd2-4436-9274-ae9424cf806f
2024-03-13 22:19:08,157 - INFO - Processing chromosome JH584295.1
2024-03-13 22:19:08,838 - INFO - Finished processing chromosome JH584295.1
2024-03-13 22:19:08,857 - INFO - Resolving multimappers
2024-03-13 22:27:20,909 - INFO - Finishing read assignment, total assignments 59373080, polyA percentage 97.0
2024-03-13 22:27:49,931 - INFO - Read assignments files saved to /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.save*.
2024-03-13 22:27:49,932 - INFO - To keep these intermediate files for debug purposes use --keep_tmp flag
2024-03-13 22:27:52,760 - INFO - Total alignments processed: 59373080, polyA tail detected in 57585840 (97.0%)
2024-03-13 22:27:52,760 - INFO - Processing assigned reads OUT
2024-03-13 22:27:52,784 - INFO - Processing chromosome chr1
2024-03-13 22:28:24,469 - INFO - Loading read assignments from /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype/OUT/aux/OUT.save_chr1
Running isoquant.py --output /exports/archive/hg-exon-skip/Qirong/IsoQuant_output/wildtype --reference /exports/archive/hg-exon-skip/Qirong/refdata-gex-mm10-2020-A/fasta/genome.fa --genedb /exports/archive/hg-exon-skip/Qirong/gencode.vM23.primary_assembly.annotation.gtf --bam /exports/archive/hg-exon-skip/Qirong/minimap2_output/wildtype/sorted.alignment.bam --data_type nanopore --sqanti_output --read_group file:/exports/archive/hg-exon-skip/Qirong/BLAZE_output/wildtype/putative_bc.csv:0:1:, --clean_start -t 1
  Max RSS: 319.096 GB
  CPU time: 22:51:28
  Wall clock time: 25:30:16

slurmstepd: error: Detected 1 oom_kill event in StepId=17704117.batch. Some of the step tasks have been OOM Killed.

From the output of your script, it shows the RSS just keep rising during the whole running process, so I am not sure where to set the maximum memory limit (not it's 350 GB)

ram_cpu_usage.zip

Would be great if you have any ideas how to solve this problem, thanks!

andrewprzh commented 3 months ago

@Qirongmao97

Seems like count grouping might cause this, how many distinct barcodes are there in your putative_bc.csv?

Also, I see some warnings from the log, probably some reads lack barcode?

As a test, you can run IsoQuant without --read_group option.

Andrey

Qirongmao97 commented 3 months ago

Yes, some of the reads are missing barcodes in the BLAZE output.

The distinct barcodes are around ~3600

I tried running IsoQuant by treating the sample as a bulk, The Max RSS is only 41.120 GB

So I was wondering if you have any ideas on how to demultiplex the ONT single-cell data, which could fit with the IsoQuant pipeline?

andrewprzh commented 3 months ago

@Qirongmao97

3600 cells doesn't sound as too much. I recall running IsoQuant with more cells. I'll try to run some tests on the newer pre-release version. Would you be interested in trying pre-release version, in case it shows better performance?

Best Andrey

Qirongmao97 commented 3 months ago

@andrewprzh

Hi Audrey! Yes, I'm working on Visium data, so cell (or spot) numbers would not be very high.

I would love to have a try the pre-release version, it would be great if could show me where I could find it, thanks!

andrewprzh commented 2 months ago

IsoQuant 3.4 is finally out. It has a better performance compared to the previous version, so you can give it a try. However, single-cell data may still require some optimizations. Please, re-open if you have further problems.

Qirongmao97 commented 1 month ago

plot_zoom_png (1)

Hi,

I tried running a task with a memory limit of 250GB, but it kept going over the limit.

I'm thinking the problem might be related to how I'm using the --read_group input. Right now, I'm using a file called putative_bc.csv from BLAZE to sort out barcodes. Do you know a better way to align things using BLAZE results with IsoQuant?"

ablab / IsoQuant

BrokenProcessPool error when mapping single cell data #165