aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
186 stars 29 forks source link

Pycistopic Temporary Fragment file cannot be found #458

Open yrsong001 opened 2 months ago

yrsong001 commented 2 months ago

Describe the bug Hi! I am following the pycistopic tutorial here. https://pycistopic.readthedocs.io/en/latest/notebooks/human_cerebellum.html. It shows a error ValueError: Fragment file ./temp/age_2y_s1/BCell1.fragments.tsv.gz does not exist., which I believe it is generated in the process. Can you help with the debugging? Thank you!

To Reproduce

fragments_dict = {'age_2y_s1': '/proj/liulab/users/yrsong/aging/Dataset_Creation/run_cellranger_atac/1-ATAC/outs/fragments.tsv.gz',
                 'age_2y_s2': './run_cellranger_atac/2-ATAC/outs/fragments.tsv.gz',
                 'age_1y_s1': './3-ATAC/outs/fragments.tsv.gz',
                 'age_1y_s2': './4-ATAC/outs/fragments.tsv.gz',
                 'age_3m_s1': './5-ATAC/outs/fragments.tsv.gz',
                 'age_3m_s2': './6-ATAC/outs/fragments.tsv.gz'}

from pycisTopic.pseudobulk_peak_calling import *
bw_paths, bed_paths = export_pseudobulk(
    input_data = cell_data,
    variable = "celltype",
    sample_id_col = "sample_id",
    chromsizes = chromsizes,
    bed_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"),
    bigwig_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"),
    path_to_fragments = fragments_dict,
    n_cpu = 10,
    normalize_bigwig = True,
    temp_dir = "./temp", 
    split_pattern = None
)

**Error output.**

bw_paths, bed_paths = export_pseudobulk(
    input_data = cell_data,
    variable = "celltype",
    sample_id_col = "sample_id",
    chromsizes = chromsizes,
    bed_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"),
    bigwig_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"),
    path_to_fragments = fragments_dict,
    n_cpu = 10,
    normalize_bigwig = True,
    temp_dir = "./temp", # /work/users/y/r/yrsong/vsc31305/
    split_pattern = None
)
2024-09-04 00:04:14,953 cisTopic     INFO     Splitting fragments by cell type.

ValueError                                Traceback (most recent call last)
Cell In[12], line 8
      6 ray.shutdown()
      7 from pycisTopic.pseudobulk_peak_calling import *
----> 8 bw_paths, bed_paths = export_pseudobulk(
      9     input_data = cell_data,
     10     variable = "celltype",
     11     sample_id_col = "sample_id",
     12     chromsizes = chromsizes,
     13     bed_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bed_files"),
     14     bigwig_path = os.path.join(out_dir, "consensus_peak_calling/pseudobulk_bw_files"),
     15     path_to_fragments = fragments_dict,
     16     n_cpu = 10,
     17     normalize_bigwig = True,
     18     temp_dir = "./temp", 
     19     split_pattern = None
     20 )

File /proj/liulab/users/yrsong/aging/Dataset_Creation/SCENIC_plus_Analysis/scplus_pipeline/Snakemake/config/pycisTopic/src/pycisTopic/pseudobulk_peak_calling.py:162, in export_pseudobulk(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, split_pattern, temp_dir)
    159 # For each sample, get fragments for each cell type
    161 log.info("Splitting fragments by cell type.")
--> 162 split_fragment_files_by_cell_type(
    163     sample_to_fragment_file = path_to_fragments,
    164     path_to_temp_folder = temp_dir,
    165     path_to_output_folder = bed_path,
    166     sample_to_cell_type_to_cell_barcodes = sample_to_cell_type_to_barcodes,
    167     chromsizes = chromsizes_dict,
    168     n_cpu = n_cpu,
    169     verbose = False,
    170     clear_temp_folder = True
    171 )
    173 bed_paths = {}
    174 for cell_type in cell_data[variable].unique():

File ~/.conda/envs/scenicplus/lib/python3.11/site-packages/scatac_fragment_tools/library/split/split_fragments_by_cell_type.py:92, in split_fragment_files_by_cell_type(sample_to_fragment_file, path_to_temp_folder, path_to_output_folder, sample_to_cell_type_to_cell_barcodes, chromsizes, n_cpu, verbose, clear_temp_folder)
     90 path_to_fragment_file = os.path.join(path_to_temp_folder, sample, f"{cell_type_sanitized}.fragments.tsv.gz")
     91 if not os.path.exists(path_to_fragment_file):
---> 92     raise ValueError(f"Fragment file {path_to_fragment_file} does not exist.")
     93 if cell_type_sanitized not in cell_type_to_fragment_files:
     94     cell_type_to_fragment_files[cell_type_sanitized] = []

**ValueError: Fragment file ./temp/age_2y_s1/BCell1.fragments.tsv.gz does not exist.**

Version (please complete the following information):

SeppeDeWinter commented 2 months ago

Hi @yrsong001

Seems like a similar issues to these two: https://github.com/aertslab/scenicplus/issues/360, https://github.com/aertslab/scenicplus/issues/314.

Can you check wether the proposed solutions work for you?

All the best,

Seppe