aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
178 stars 28 forks source link

Unable to run pbmc tutorial ( export_pseudobulk) #302

Closed Tu4n-ph4m closed 7 months ago

Tu4n-ph4m commented 7 months ago

Describe the bug Hi there, I'm trying to re-run the pbmc tutorial but it doesn't seem to work on my end.

The command I tried: from pycisTopic.pseudobulk_peak_calling import export_pseudobulk bw_paths, bed_paths = export_pseudobulk(input_data = cell_data, variable = 'celltype', # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype sample_id_col = 'sample_id', chromsizes = chromsizes, bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'), # specify where pseudobulk_bed_files should be stored bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),# specify where pseudobulk_bw_files should be stored path_to_fragments = fragments_dict, # location of fragment fiels n_cpu = 8, # specify the number of cores to use, we use ray for multi processing normalize_bigwig = True, temp_dir = os.path.join(tmp_dir, 'ray_spill'), split_pattern = '-')

Here's the error I have:

RuntimeError: You must provide a valid set of entries. These can be comprised of any of the following:

  1. A list of each of chromosomes, start positions, end positions and values.
  2. A list of each of start positions and values. Also, a chromosome and span must be specified.
  3. A list values, in which case a single chromosome, start position, span and step must be specified.

I have tried getting the index file, commenting out remove duplicates but they don't seem to work

Full error message is as follows: 2024-02-20 10:15:09,203 cisTopic INFO Splitting fragments by cell type. 2024-02-20 10:15:42,588 cisTopic INFO generating bigwig files

_RemoteTraceback Traceback (most recent call last) _RemoteTraceback: """ Traceback (most recent call last): File "/users/tpham43/.local/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker r = call_item() File "/users/tpham43/.local/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 275, in call return self.fn(*self.args, self.kwargs) File "/users/tpham43/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 620, in call return self.func(*args, *kwargs) File "/users/tpham43/.local/lib/python3.8/site-packages/joblib/parallel.py", line 288, in call return [func(args, kwargs) File "/users/tpham43/.local/lib/python3.8/site-packages/joblib/parallel.py", line 288, in return [func(*args, **kwargs) File "/users/tpham43/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py", line 33, in _generate_bigwig fragments_to_bw( File "/users/tpham43/.conda/envs/scenicplus/lib/python3.8/site-packages/scatac_fragment_tools/library/bigwig/fragments_to_bigwig.py", line 566, in fragments_to_bw fragments_to_bw_with_pybigwig( File "/users/tpham43/.conda/envs/scenicplus/lib/python3.8/site-packages/scatac_fragment_tools/library/bigwig/fragments_to_bigwig.py", line 464, in fragments_to_bw_with_pybigwig bw.addEntries(chroms=chroms, starts=starts, ends=ends, values=values) RuntimeError: You must provide a valid set of entries. These can be comprised of any of the following:

  1. A list of each of chromosomes, start positions, end positions and values.
  2. A list of each of start positions and values. Also, a chromosome and span must be specified.
  3. A list values, in which case a single chromosome, start position, span and step must be specified.

"""

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last) Cell In[26], line 2 1 from pycisTopic.pseudobulk_peak_calling import export_pseudobulk ----> 2 bw_paths, bed_paths = export_pseudobulk(input_data = cell_data, 3 variable = 'celltype', # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype 4 sample_id_col = 'sample_id', 5 chromsizes = chromsizes, 6 bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'), # specify where pseudobulk_bed_files should be stored 7 bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),# specify where pseudobulk_bw_files should be stored 8 path_to_fragments = fragments_dict, # location of fragment fiels 9 n_cpu = 8, # specify the number of cores to use, we use ray for multi processing 10 normalize_bigwig = True, 11 # remove_duplicates = True, 12 temp_dir = os.path.join(tmp_dir, 'ray_spill'), 13 split_pattern = '-')

File ~/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py:178, in export_pseudobulk(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, split_pattern, temp_dir) 175 log.warning(f"Missing fragments for {cell_type}!") 177 log.info("generating bigwig files") --> 178 joblib.Parallel(n_jobs=n_cpu)( 179 joblib.delayed(_generate_bigwig) 180 ( 181 path_to_fragments = bed_paths[cell_type], 182 chromsizes = chromsizes_dict, 183 normalize_bigwig = normalize_bigwig, 184 bw_filename = os.path.join(bigwig_path, f"{_santize_string_for_filename(cell_type)}.bw"), 185 log = log 186 ) 187 for cell_type in bed_paths.keys() 188 ) 189 bw_paths = {} 190 for cell_type in cell_data[variable].unique():

File ~/.local/lib/python3.8/site-packages/joblib/parallel.py:1098, in Parallel.call(self, iterable) 1095 self._iterating = False 1097 with self._backend.retrieval_context(): -> 1098 self.retrieve() 1099 # Make sure that we get a last message telling us we are done 1100 elapsed_time = time.time() - self._start_time

File ~/.local/lib/python3.8/site-packages/joblib/parallel.py:975, in Parallel.retrieve(self) 973 try: 974 if getattr(self._backend, 'supports_timeout', False): --> 975 self._output.extend(job.get(timeout=self.timeout)) 976 else: 977 self._output.extend(job.get())

File ~/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py:567, in LokyBackend.wrap_future_result(future, timeout) 564 """Wrapper for Future.result to implement the same behaviour as 565 AsyncResults.get from multiprocessing.""" 566 try: --> 567 return future.result(timeout=timeout) 568 except CfTimeoutError as e: 569 raise TimeoutError from e

File ~/.conda/envs/scenicplus/lib/python3.8/concurrent/futures/_base.py:444, in Future.result(self, timeout) 442 raise CancelledError() 443 elif self._state == FINISHED: --> 444 return self.__get_result() 445 else: 446 raise TimeoutError()

File ~/.conda/envs/scenicplus/lib/python3.8/concurrent/futures/_base.py:389, in Future.__get_result(self) 387 if self._exception: 388 try: --> 389 raise self._exception 390 finally: 391 # Break a reference cycle with the exception in self._exception 392 self = None

RuntimeError: You must provide a valid set of entries. These can be comprised of any of the following:

  1. A list of each of chromosomes, start positions, end positions and values.
  2. A list of each of start positions and values. Also, a chromosome and span must be specified.
  3. A list values, in which case a single chromosome, start position, span and step must be specified.
Tu4n-ph4m commented 7 months ago

I moved this issue to https://github.com/aertslab/scenicplus/issues/303 as I accidentally closed it