aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
163 stars 27 forks source link

RuntimeError from pbmc tutorial ( export_pseudobulk) #303

Closed Tu4n-ph4m closed 3 months ago

Tu4n-ph4m commented 4 months ago

Sorryq for the duplicate threads. I wasn't sure where I should post the issue so I just went back and forth between scenicplus and pycistopic but then I think sceniplus might be of help for more people so here it is.

Describe the bug Hi there, I'm trying to re-run the pbmc tutorial but it doesn't seem to work on my end.

The command I tried:


from pycisTopic.pseudobulk_peak_calling import export_pseudobulk
bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
variable = 'celltype', # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
sample_id_col = 'sample_id',
chromsizes = chromsizes,
bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'), # specify where pseudobulk_bed_files should be stored
bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),# specify where pseudobulk_bw_files should be stored
path_to_fragments = fragments_dict, # location of fragment fiels
n_cpu = 8, # specify the number of cores to use, we use ray for multi processing
normalize_bigwig = True,
temp_dir = os.path.join(tmp_dir, 'ray_spill'),
split_pattern = '-')

Here's the error I have:


RuntimeError: You must provide a valid set of entries. These can be comprised of any of the following:

A list of each of chromosomes, start positions, end positions and values.
A list of each of start positions and values. Also, a chromosome and span must be specified.
A list values, in which case a single chromosome, start position, span and step must be specified.
I have tried getting the index file, commenting out remove duplicates but they don't seem to work

Full error message is as follows:


2024-02-20 10:15:09,203 cisTopic INFO Splitting fragments by cell type.
2024-02-20 10:15:42,588 cisTopic INFO generating bigwig files
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "/users/tpham43/.local/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
r = call_item()
File "/users/tpham43/.local/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 275, in call
return self.fn(*self.args, **self.kwargs)
File "/users/tpham43/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 620, in call
return self.func(*args, **kwargs)
File "/users/tpham43/.local/lib/python3.8/site-packages/joblib/parallel.py", line 288, in call
return [func(*args, **kwargs)
File "/users/tpham43/.local/lib/python3.8/site-packages/joblib/parallel.py", line 288, in
return [func(*args, **kwargs)
File "/users/tpham43/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py", line 33, in _generate_bigwig
fragments_to_bw(
File "/users/tpham43/.conda/envs/scenicplus/lib/python3.8/site-packages/scatac_fragment_tools/library/bigwig/fragments_to_bigwig.py", line 566, in fragments_to_bw
fragments_to_bw_with_pybigwig(
File "/users/tpham43/.conda/envs/scenicplus/lib/python3.8/site-packages/scatac_fragment_tools/library/bigwig/fragments_to_bigwig.py", line 464, in fragments_to_bw_with_pybigwig
bw.addEntries(chroms=chroms, starts=starts, ends=ends, values=values)
RuntimeError: You must provide a valid set of entries. These can be comprised of any of the following:

A list of each of chromosomes, start positions, end positions and values.
A list of each of start positions and values. Also, a chromosome and span must be specified.
A list values, in which case a single chromosome, start position, span and step must be specified.
"""

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
Cell In[26], line 2
1 from pycisTopic.pseudobulk_peak_calling import export_pseudobulk
----> 2 bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
3 variable = 'celltype', # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
4 sample_id_col = 'sample_id',
5 chromsizes = chromsizes,
6 bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'), # specify where pseudobulk_bed_files should be stored
7 bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),# specify where pseudobulk_bw_files should be stored
8 path_to_fragments = fragments_dict, # location of fragment fiels
9 n_cpu = 8, # specify the number of cores to use, we use ray for multi processing
10 normalize_bigwig = True,
11 # remove_duplicates = True,
12 temp_dir = os.path.join(tmp_dir, 'ray_spill'),
13 split_pattern = '-')

File ~/.conda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py:178, in export_pseudobulk(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, split_pattern, temp_dir)
175 log.warning(f"Missing fragments for {cell_type}!")
177 log.info("generating bigwig files")
--> 178 joblib.Parallel(n_jobs=n_cpu)(
179 joblib.delayed(_generate_bigwig)
180 (
181 path_to_fragments = bed_paths[cell_type],
182 chromsizes = chromsizes_dict,
183 normalize_bigwig = normalize_bigwig,
184 bw_filename = os.path.join(bigwig_path, f"{_santize_string_for_filename(cell_type)}.bw"),
185 log = log
186 )
187 for cell_type in bed_paths.keys()
188 )
189 bw_paths = {}
190 for cell_type in cell_data[variable].unique():

File ~/.local/lib/python3.8/site-packages/joblib/parallel.py:1098, in Parallel.call(self, iterable)
1095 self._iterating = False
1097 with self._backend.retrieval_context():
-> 1098 self.retrieve()
1099 # Make sure that we get a last message telling us we are done
1100 elapsed_time = time.time() - self._start_time

File ~/.local/lib/python3.8/site-packages/joblib/parallel.py:975, in Parallel.retrieve(self)
973 try:
974 if getattr(self._backend, 'supports_timeout', False):
--> 975 self._output.extend(job.get(timeout=self.timeout))
976 else:
977 self._output.extend(job.get())

File ~/.local/lib/python3.8/site-packages/joblib/_parallel_backends.py:567, in LokyBackend.wrap_future_result(future, timeout)
564 """Wrapper for Future.result to implement the same behaviour as
565 AsyncResults.get from multiprocessing."""
566 try:
--> 567 return future.result(timeout=timeout)
568 except CfTimeoutError as e:
569 raise TimeoutError from e

File ~/.conda/envs/scenicplus/lib/python3.8/concurrent/futures/_base.py:444, in Future.result(self, timeout)
442 raise CancelledError()
443 elif self._state == FINISHED:
--> 444 return self.__get_result()
445 else:
446 raise TimeoutError()

File ~/.conda/envs/scenicplus/lib/python3.8/concurrent/futures/_base.py:389, in Future.__get_result(self)
387 if self._exception:
388 try:
--> 389 raise self._exception
390 finally:
391 # Break a reference cycle with the exception in self._exception
392 self = None

RuntimeError: You must provide a valid set of entries. These can be comprised of any of the following:

A list of each of chromosomes, start positions, end positions and values.
A list of each of start positions and values. Also, a chromosome and span must be specified.
A list values, in which case a single chromosome, start position, span and step must be specified.
SeppeDeWinter commented 4 months ago

Hi @Tu4n-ph4m

No worries.

Can you show the head of each file in scATAC/consensus_peak_calling/pseudobulk_bed_files/ using


zcat <FILENAME>.fragments.tsv.gz | head

All the best,

Seppe

Tu4n-ph4m commented 4 months ago

Here it is: Screenshot 2024-02-26 at 12 25 48 PM

ghuls commented 3 months ago

Could you try running the commandline version: scatac_fragment_tools bigwig on your fragment files? https://github.com/aertslab/scatac_fragment_tools

# Try the following:

# Write normalized bigwig with pybigwig
scatac_fragment_tools bigwig \
    -i <FRAGMENTS_FILENAME> \
    -c <CHROM_SIZES_FILENAME> \
    -o <BIGWIG_FILENAME> \
    -w pybigwig \
    -n \
    -v

# Write normalized bigwig with pybigtools
scatac_fragment_tools bigwig \
    -i <FRAGMENTS_FILENAME> \
    -c <CHROM_SIZES_FILENAME> \
    -o <BIGWIG_FILENAME> \
    -w pybigtools
    -n \
    -v 

Report the output it prints.

Tu4n-ph4m commented 3 months ago

I got this error while running the code. The fragment file is downloaded from the tutorial for pbmc and the chrom size file is downloaded from 'http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes'

Screenshot 2024-03-13 at 1 36 14 PM
SeppeDeWinter commented 3 months ago

Hi @Tu4n-ph4m

I just downloaded the fragments file again and ran this code and I was not able to reproduce your error.

Can you make sure your fragment file is still intact? Maybe try redownloading it.

All the best,

Seppe

Tu4n-ph4m commented 3 months ago

Hi Seppe, I just redownloaded it and got the same error. I looked into the file with zcat it seems about right to me. I also double checked that I installed the "scatac_fragment_tools" tool and it seems to do alright. So I'm a bit confused...

Screenshot 2024-03-14 at 12 44 39 PM Screenshot 2024-03-14 at 12 37 08 PM
SeppeDeWinter commented 3 months ago

Hi @Tu4n-ph4m

What version of pybigwig do you have?


import pybigwig
pybigwig.__version__

All the best,

Seppe

Tu4n-ph4m commented 3 months ago

I'm running version 0.3.22

Screenshot 2024-03-15 at 10 24 22 AM
SeppeDeWinter commented 3 months ago

Hi @Tu4n-ph4m

With version 0.3.22 it did work for me ... Really not sure what is going on here.

For my test I used python 3.11, can you try that? Could you also try in a new environment with only scatac_fragment_tools installed?

All the best,

Seppe

Tu4n-ph4m commented 3 months ago

Hi @SeppeDeWinter , sorry forgot to update. It was because I installed the tbi file in the wrong place so it wasn't able to run successfully. Thank you so much for your help!