aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
178 stars 28 forks source link

RunTimeError with export_pseudobulk #284

Closed mairamirza closed 8 months ago

mairamirza commented 8 months ago

Dear Scenic+ Team, I am encountering error while running export_pseudo step and the error ‘DataFrame, object has no attribute chromosome. Based on #277, I fixed this error with

In the _to_bigwig() function If not divide: gr=self

But now when I try to run export_pseudo, I get Runtime Error which is

RuntimeError: You must provide a valid set of entries. These can be comprised of any of the following:

  1. A list of each of chromosomes, start positions, end positions and values.
  2. A list of each of start positions and values. Also, a chromosome and span must be specified.
  3. A list values, in which case a single chromosome, start position, span and step must be specified.

I am getting this error despite following the exact dataset and the tutorial here : https://scenicplus.readthedocs.io/en/latest/pbmc_multiome_tutorial.html

I also tried following the pycistarget tutorial from here https://pycistopic.readthedocs.io/en/latest/Cortex_pycisTopic.html for mouse cortex dataset but got the same error.

Session_info:

pycisTopic 1.0.3.dev21+ge9b0e1a pyranges 0.0.127 pyscenic 0.12.1+6.g31d51a1 ray 2.9.0 scanpy 1.9.5 scenicplus 1.0.1.dev6+ge5ba6fc

Error attached:

from pycisTopic.pseudobulk_peak_calling import export_pseudobulk
bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
                 variable = 'celltype',                                                                     # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
                 sample_id_col = 'sample_id',
                 chromsizes = chromsizes,
                 bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'),  # specify where pseudobulk_bed_files should be stored
                 bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),# specify where pseudobulk_bw_files should be stored
                 path_to_fragments = fragments_dict,                                                        # location of fragment fiels
                 n_cpu = 1,                                                                                 # specify the number of cores to use, we use ray for multi processing
                 normalize_bigwig = True,
                 remove_duplicates = True,
                 _temp_dir = os.path.join(tmp_dir, 'ray_spill'),
                 split_pattern = '-')

File ~/localenv/mirza/anaconda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py:186, in export_pseudobulk(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, remove_duplicates, split_pattern, use_polars, **kwargs)
    184     ray.shutdown()
    185 else:
--> 186     [
    187         export_pseudobulk_one_sample(
    188             cell_data,
    189             group,
    190             fragments_df_dict,
    191             chromsizes,
    192             bigwig_path,
    193             bed_path,
    194             sample_id_col,
    195             normalize_bigwig,
    196             remove_duplicates,
    197             split_pattern,
    198         )
    199         for group in groups
    200     ]
    202 return bw_paths, bed_paths

File ~/localenv/mirza/anaconda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py:187, in <listcomp>(.0)
    184     ray.shutdown()
    185 else:
    186     [
--> 187         export_pseudobulk_one_sample(
    188             cell_data,
    189             group,
    190             fragments_df_dict,
    191             chromsizes,
    192             bigwig_path,
    193             bed_path,
    194             sample_id_col,
    195             normalize_bigwig,
    196             remove_duplicates,
    197             split_pattern,
    198         )
    199         for group in groups
    200     ]
    202 return bw_paths, bed_paths

File ~/localenv/mirza/anaconda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py:285, in export_pseudobulk_one_sample(cell_data, group, fragments_df_dict, chromsizes, bigwig_path, bed_path, sample_id_col, normalize_bigwig, remove_duplicates, split_pattern)
    283 bigwig_path_group = os.path.join(bigwig_path, str(group) + ".bw")
    284 if remove_duplicates:
--> 285     group_pr.to_bigwig(
    286         path=bigwig_path_group,
    287         chromosome_sizes=chromsizes,
    288         rpm=normalize_bigwig,
    289     )
    290 else:
    291     group_pr.to_bigwig(
    292         path=bigwig_path_group,
    293         chromosome_sizes=chromsizes,
    294         rpm=normalize_bigwig,
    295         value_col="Score",
    296     )

File ~/localenv/mirza/anaconda/envs/scenicplus/lib/python3.8/site-packages/pyranges/pyranges_main.py:5506, in PyRanges.to_bigwig(self, path, chromosome_sizes, rpm, divide, value_col, dryrun, chain)
   5503 if chromosome_sizes is None:
   5504     chromosome_sizes = pr.data.chromsizes()
-> 5506 result = _to_bigwig(self, path, chromosome_sizes, rpm, divide, value_col, dryrun)
   5508 if dryrun:
   5509     return result

File ~/localenv/mirza/anaconda/envs/scenicplus/lib/python3.8/site-packages/pyranges/out.py:238, in _to_bigwig(self, path, chromosome_sizes, rpm, divide, value_col, dryrun)
    235 ends = df.End.tolist()
    236 values = df.Score.tolist()
--> 238 bw.addEntries(chromosomes, starts, ends=ends, values=values)

RuntimeError: You must provide a valid set of entries. These can be comprised of any of the following: 
1. A list of each of chromosomes, start positions, end positions and values.
2. A list of each of start positions and values. Also, a chromosome and span must be specified.
3. A list values, in which case a single chromosome, start position, span and step must be specified.

Can you help me solve this issue? Many Thanks!

acihanckr commented 8 months ago

I had the same errors and followed the same steps and got this error as well. When I replaced the code in pycisTopic.pseudobulk_peak_calling.py with the suggested branch (in the following comment) the issue was solved.

https://github.com/aertslab/scenicplus/issues/277#issuecomment-1880305230

I hope it helps!

mairamirza commented 8 months ago

Thanks @acihanckr for the suggestion, I will check this out!

SeppeDeWinter commented 8 months ago

Hi @mairamirza and @acihanckr

I just pushed some changes to the main branch that should solve this issue completely, see https://github.com/aertslab/pycisTopic/commit/1afbd1d71dd9caf2f8f53d4c752240089b182bc9.

Can you test wether this indeed solves your issue?

All the best,

Seppe