aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
186 stars 29 forks source link

export_pseudobulk Error append after pseudobulk_bed_files output produced #485

Closed zhenghuiwen-bgi closed 1 month ago

zhenghuiwen-bgi commented 1 month ago

Describe the bug Hi! I am following the pycistopic tutorial here. https://pycistopic.readthedocs.io/en/latest/notebooks/human_cerebellum.html. An error occurred when running export_pseudobulk. After all tsv files were generated in the output folder pseudobulk_bed_files, an error was reported. There was nothing in pseudobulk_bw_files. Looking forward to your answer! ThankU!

To Reproduce from pycisTopic.pseudobulk_peak_calling import export_pseudobulk os.makedirs(os.path.join(out_dir, "consensus_peak_calling"), exist_ok = True) os.makedirs(os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"), exist_ok = True) os.makedirs(os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"), exist_ok = True)

bw_paths, bed_paths = export_pseudobulk( input_data = cell_data, variable = "Main_cluster_name_re", sample_id_col = "Round4_Barcode", chromsizes = chromsizes, bed_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"), bigwig_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"), path_to_fragments = fragments_dict, n_cpu = 10, normalize_bigwig = True, temp_dir = "/tmp", split_pattern = "-" )

Error output UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak. warnings.warn( thread '' panicked at src/aggregate_fragments.rs:75:51: called Result::unwrap() on an Err value: ParseIntError { kind: InvalidDigit } note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

The above exception was the direct cause of the following exception:

PanicException Traceback (most recent call last) Cell In[56], line 7 3 os.makedirs(os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"), exist_ok = True) 4 os.makedirs(os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"), exist_ok = True) ----> 7 bw_paths, bed_paths = export_pseudobulk( 8 input_data = cell_data, 9 variable = "Main_cluster_name_re", 10 sample_id_col = "Round4_Barcode", 11 chromsizes = chromsizes, 12 bed_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"), 13 bigwig_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"), 14 path_to_fragments = fragments_dict, 15 n_cpu = 10, 16 normalize_bigwig = True, 17 temp_dir = "/tmp", 18 split_pattern = "-" 19 )

File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/pseudobulk_peak_calling.py:162, in export_pseudobulk(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, split_pattern, temp_dir) 159 # For each sample, get fragments for each cell type 161 log.info("Splitting fragments by cell type.") --> 162 split_fragment_files_by_cell_type( 163 sample_to_fragment_file = path_to_fragments, 164 path_to_temp_folder = temp_dir, 165 path_to_output_folder = bed_path, 166 sample_to_cell_type_to_cell_barcodes = sample_to_cell_type_to_barcodes, 167 chromsizes = chromsizes_dict, 168 n_cpu = n_cpu, 169 verbose = False, 170 clear_temp_folder = True 171 ) 173 bed_paths = {} 174 for cell_type in cell_data[variable].unique():

File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scatac_fragment_tools/library/split/split_fragments_by_cell_type.py:100, in split_fragment_files_by_cell_type(sample_to_fragment_file, path_to_temp_folder, path_to_output_folder, sample_to_cell_type_to_cell_barcodes, chromsizes, n_cpu, verbose, clear_temp_folder) 98 if verbose: 99 print("Merging fragments ...") --> 100 joblib.Parallel(n_jobs=n_cpu)( 101 joblib.delayed(_rust_scatac_fragment_tools.merge_fragment_files) 102 ( 103 path_to_fragment_files = cell_type_to_fragment_files[cell_type], 104 path_to_output_file = os.path.join(path_to_output_folder, f"{cell_type}.fragments.tsv.gz"), 105 number_of_threads = NUMBER_OF_WRITER_THREADS, 106 verbose = verbose 107 ) 108 for cell_type in cell_type_to_fragment_files 109 ) 111 # Check wether all files were create successfully 112 for cell_type in cell_type_to_fragment_files:

File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:1952, in Parallel.call(self, iterable) 1946 # The first item from the output is blank, but it makes the interpreter 1947 # progress until it enters the Try/Except block of the generator and 1948 # reach the first yield statement. This starts the aynchronous 1949 # dispatch of the tasks to the workers. 1950 next(output) -> 1952 return output if self.return_generator else list(output)

File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:1595, in Parallel._get_outputs(self, iterator, pre_dispatch) 1592 yield 1594 with self._backend.retrieval_context(): -> 1595 yield from self._retrieve() 1597 except GeneratorExit: 1598 # The generator has been garbage collected before being fully 1599 # consumed. This aborts the remaining tasks if possible and warn 1600 # the user if necessary. 1601 self._exception = True

File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:1699, in Parallel._retrieve(self) 1692 while self._wait_retrieval(): 1693 1694 # If the callback thread of a worker has signaled that its task 1695 # triggered an exception, or if the retrieval loop has raised an 1696 # exception (e.g. GeneratorExit), exit the loop and surface the 1697 # worker traceback. 1698 if self._aborting: -> 1699 self._raise_error_fast() 1700 break 1702 # If the next job is not ready for retrieval yet, we just wait for 1703 # async callbacks to progress.

File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:1734, in Parallel._raise_error_fast(self) 1730 # If this error job exists, immediatly raise the error by 1731 # calling get_result. This job might not exists if abort has been 1732 # called directly or if the generator is gc'ed. 1733 if error_job is not None: -> 1734 error_job.get_result(self.timeout)

File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:736, in BatchCompletionCallBack.get_result(self, timeout) 730 backend = self.parallel._backend 732 if backend.supports_retrieve_callback: 733 # We assume that the result has already been retrieved by the 734 # callback thread, and is stored internally. It's just waiting to 735 # be returned. --> 736 return self._return_or_raise() 738 # For other backends, the main thread needs to run the retrieval step. 739 try:

File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:754, in BatchCompletionCallBack._return_or_raise(self) 752 try: 753 if self.status == TASK_ERROR: --> 754 raise self._result 755 return self._result 756 finally:

PanicException: called Result::unwrap() on an Err value: ParseIntError { kind: InvalidDigit } ​ Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.

Version (please complete the following information):

Additional context Add any other context about the problem here.

yojetsharma commented 1 month ago

How does your cistopic_obj.cell_data() looks like?

ghuls commented 1 month ago

Maybe your default temp dir is too small, so incomplete temporary files are written. Specify a temp_dir location that has enough free space.

zhenghuiwen-bgi commented 1 month ago

How does your cistopic_obj.cell_data() looks like? cell_data like this

WechatIMG796

and the fregments file like this

image

I tried some things and checked the pseudobulk_bed_files file after the error message. Some of the cell type folders were empty. I deleted these empty cell types from cell_data and it worked. But when I added other samples, the cell types with empty folders were different.

zhenghuiwen-bgi commented 1 month ago

How does your cistopic_obj.cell_data() looks like? I downloaded ATAC data from a public database, but I didn't know the mm9 genome used for aligment, so this error message appeared. The problem has been solved.