Closed zhenghuiwen-bgi closed 1 month ago
How does your cistopic_obj.cell_data()
looks like?
Maybe your default temp dir is too small, so incomplete temporary files are written. Specify a temp_dir location that has enough free space.
How does your
cistopic_obj.cell_data()
looks like? cell_data like thisand the fregments file like this
I tried some things and checked the pseudobulk_bed_files file after the error message. Some of the cell type folders were empty. I deleted these empty cell types from cell_data and it worked. But when I added other samples, the cell types with empty folders were different.
How does your
cistopic_obj.cell_data()
looks like? I downloaded ATAC data from a public database, but I didn't know the mm9 genome used for aligment, so this error message appeared. The problem has been solved.
Describe the bug Hi! I am following the pycistopic tutorial here. https://pycistopic.readthedocs.io/en/latest/notebooks/human_cerebellum.html. An error occurred when running export_pseudobulk. After all tsv files were generated in the output folder pseudobulk_bed_files, an error was reported. There was nothing in pseudobulk_bw_files. Looking forward to your answer! ThankU!
To Reproduce from pycisTopic.pseudobulk_peak_calling import export_pseudobulk os.makedirs(os.path.join(out_dir, "consensus_peak_calling"), exist_ok = True) os.makedirs(os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"), exist_ok = True) os.makedirs(os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"), exist_ok = True)
bw_paths, bed_paths = export_pseudobulk( input_data = cell_data, variable = "Main_cluster_name_re", sample_id_col = "Round4_Barcode", chromsizes = chromsizes, bed_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"), bigwig_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"), path_to_fragments = fragments_dict, n_cpu = 10, normalize_bigwig = True, temp_dir = "/tmp", split_pattern = "-" )
Error output UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak. warnings.warn( thread '' panicked at src/aggregate_fragments.rs:75:51:
called
Result::unwrap()
on anErr
value: ParseIntError { kind: InvalidDigit } note: run withRUST_BACKTRACE=1
environment variable to display a backtraceThe above exception was the direct cause of the following exception:
PanicException Traceback (most recent call last) Cell In[56], line 7 3 os.makedirs(os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"), exist_ok = True) 4 os.makedirs(os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"), exist_ok = True) ----> 7 bw_paths, bed_paths = export_pseudobulk( 8 input_data = cell_data, 9 variable = "Main_cluster_name_re", 10 sample_id_col = "Round4_Barcode", 11 chromsizes = chromsizes, 12 bed_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"), 13 bigwig_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"), 14 path_to_fragments = fragments_dict, 15 n_cpu = 10, 16 normalize_bigwig = True, 17 temp_dir = "/tmp", 18 split_pattern = "-" 19 )
File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/pseudobulk_peak_calling.py:162, in export_pseudobulk(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, split_pattern, temp_dir) 159 # For each sample, get fragments for each cell type 161 log.info("Splitting fragments by cell type.") --> 162 split_fragment_files_by_cell_type( 163 sample_to_fragment_file = path_to_fragments, 164 path_to_temp_folder = temp_dir, 165 path_to_output_folder = bed_path, 166 sample_to_cell_type_to_cell_barcodes = sample_to_cell_type_to_barcodes, 167 chromsizes = chromsizes_dict, 168 n_cpu = n_cpu, 169 verbose = False, 170 clear_temp_folder = True 171 ) 173 bed_paths = {} 174 for cell_type in cell_data[variable].unique():
File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scatac_fragment_tools/library/split/split_fragments_by_cell_type.py:100, in split_fragment_files_by_cell_type(sample_to_fragment_file, path_to_temp_folder, path_to_output_folder, sample_to_cell_type_to_cell_barcodes, chromsizes, n_cpu, verbose, clear_temp_folder) 98 if verbose: 99 print("Merging fragments ...") --> 100 joblib.Parallel(n_jobs=n_cpu)( 101 joblib.delayed(_rust_scatac_fragment_tools.merge_fragment_files) 102 ( 103 path_to_fragment_files = cell_type_to_fragment_files[cell_type], 104 path_to_output_file = os.path.join(path_to_output_folder, f"{cell_type}.fragments.tsv.gz"), 105 number_of_threads = NUMBER_OF_WRITER_THREADS, 106 verbose = verbose 107 ) 108 for cell_type in cell_type_to_fragment_files 109 ) 111 # Check wether all files were create successfully 112 for cell_type in cell_type_to_fragment_files:
File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:1952, in Parallel.call(self, iterable) 1946 # The first item from the output is blank, but it makes the interpreter 1947 # progress until it enters the Try/Except block of the generator and 1948 # reach the first
yield
statement. This starts the aynchronous 1949 # dispatch of the tasks to the workers. 1950 next(output) -> 1952 return output if self.return_generator else list(output)File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:1595, in Parallel._get_outputs(self, iterator, pre_dispatch) 1592 yield 1594 with self._backend.retrieval_context(): -> 1595 yield from self._retrieve() 1597 except GeneratorExit: 1598 # The generator has been garbage collected before being fully 1599 # consumed. This aborts the remaining tasks if possible and warn 1600 # the user if necessary. 1601 self._exception = True
File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:1699, in Parallel._retrieve(self) 1692 while self._wait_retrieval(): 1693 1694 # If the callback thread of a worker has signaled that its task 1695 # triggered an exception, or if the retrieval loop has raised an 1696 # exception (e.g.
GeneratorExit
), exit the loop and surface the 1697 # worker traceback. 1698 if self._aborting: -> 1699 self._raise_error_fast() 1700 break 1702 # If the next job is not ready for retrieval yet, we just wait for 1703 # async callbacks to progress.File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:1734, in Parallel._raise_error_fast(self) 1730 # If this error job exists, immediatly raise the error by 1731 # calling get_result. This job might not exists if abort has been 1732 # called directly or if the generator is gc'ed. 1733 if error_job is not None: -> 1734 error_job.get_result(self.timeout)
File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:736, in BatchCompletionCallBack.get_result(self, timeout) 730 backend = self.parallel._backend 732 if backend.supports_retrieve_callback: 733 # We assume that the result has already been retrieved by the 734 # callback thread, and is stored internally. It's just waiting to 735 # be returned. --> 736 return self._return_or_raise() 738 # For other backends, the main thread needs to run the retrieval step. 739 try:
File /DATA/User/zhenghuiwen/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py:754, in BatchCompletionCallBack._return_or_raise(self) 752 try: 753 if self.status == TASK_ERROR: --> 754 raise self._result 755 return self._result 756 finally:
PanicException: called
Result::unwrap()
on anErr
value: ParseIntError { kind: InvalidDigit } Expected behavior A clear and concise description of what you expected to happen.Screenshots If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.
Version (please complete the following information):
Additional context Add any other context about the problem here.