aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
167 stars 27 forks source link

Cannot export pseudobulk files #61

Closed Abigail575 closed 1 year ago

Abigail575 commented 1 year ago

Hello, I don't receive an error when executing the following, but when comparing the output to that on the Scenic+ tutorial I can see the pseudobulk files haven't been created. Do you know why this is? Thanks!

from pycisTopic.pseudobulk_peak_calling import export_pseudobulk
bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
             variable = 'celltype',                                                                     
             sample_id_col = 'sample_id',
             chromsizes = chromsizes,
             bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'),  
             bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),
             path_to_fragments = fragments_dict,                                                        
             n_cpu = 8,                                                                                
             normalize_bigwig = True,
             remove_duplicates = True,
             _temp_dir = os.path.join(tmp_dir, 'ray_spill'),
             split_pattern = '-')

2022-11-03 16:07:53,397 cisTopic INFO Reading fragments from pbmc_tutorial/data/atac_fragments.tsv.gz 2022-11-03 16:09:26,040 INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265/ (export_pseudobulk_ray pid=703) 2022-11-03 16:09:35,151 cisTopic INFO Creating pseudobulk for endocrine (export_pseudobulk_ray pid=702) 2022-11-03 16:09:35,151 cisTopic INFO Creating pseudobulk for ductal

Specifically, I cannot see "export_pseudobulk_ray...cisTopic INFO ductal done! etc

Abigail575 commented 1 year ago

I have since tried running this starting with my H5 file and not an anndata file. Here, there is a specific error that I think relates to a non-standard chromosome format in my fragments file from the mouse reference genome used to generate the dataset (mm10 I believe). Any idea how to fix this? Thanks!

Error message:

2022-11-03 16:31:47,862 INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265/ 2022-11-03 16:32:22,125 ERROR worker.py:399 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::export_pseudobulk_ray() (pid=829, ip=127.0.0.1) File "/Users/Abigail/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py", line 345, in export_pseudobulk_ray export_pseudobulk_one_sample( File "/Users/Abigail/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py", line 284, in export_pseudobulk_one_sample group_pr.to_bigwig( File "/Users/Abigail/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pyranges/pyranges.py", line 5339, in to_bigwig result = _to_bigwig(self, path, chromosome_sizes, rpm, divide, value_col, dryrun) File "/Users/benjaminisaacson/Desktop/Abigail/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pyranges/out.py", line 192, in _to_bigwig header = [(c, int(chromosome_sizes[c])) for c in unique_chromosomes] File "/Users/Abigail/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pyranges/out.py", line 192, in header = [(c, int(chromosome_sizes[c])) for c in unique_chromosomes] KeyError: 'GL456210.1'

SeppeDeWinter commented 1 year ago

Hi @Abigail575

Sorry for the late reply.

You're getting this error because there is no entry for GL456210.1 in your chromsizes file (probably also not for the other non-standard chromosomes).

You can solve this by either using a chromsizes file containing these chromosomes or by removing fragments coming from these chromosomes from your fragments file. The former is probably easier.

Hope this helps!

Best,

Seppe

Abigail575 commented 1 year ago

Thanks Seppe! It worked when I completely removed the GL (and also JH) locations from my TSV file. I will try your method so I can include these data in my analysis. Thanks for all your help! Abigail