aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
162 stars 27 forks source link

export_pseudobulk error: fragment file does not exist #360

Closed TemiLeke closed 2 months ago

TemiLeke commented 2 months ago

Describe the bug Hello team,

When I try to run the export_pseudobulk function, I get the following error ValueError: Fragment file /Volumes/Seagate/SEA-AD/MTG/ATACseq/temp_files/ray_spill/1142534944/Astrocyte.fragments.tsv.gz does not exist." }

To Reproduce

bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
                 variable = 'celltype',                                                                     # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
                 sample_id_col = 'sample_id',
                 chromsizes = chromsizes,
                 bed_path = os.path.join(save_dir, 'consensus_peak_calling/pseudobulk_bed_files/'),  # specify where pseudobulk_bed_files should be stored
                 bigwig_path = os.path.join(save_dir, 'consensus_peak_calling/pseudobulk_bw_files/'), # specify where pseudobulk_bw_files should be stored
                 path_to_fragments = fragments_dict,                                                        # location of fragment fiels
                 n_cpu = 4,                                                                                 # specify the number of cores to use, we use ray for multi processing
                 normalize_bigwig = True,
                 #remove_duplicates = True,
                 temp_dir = os.path.join(tmp_dir, 'ray_spill'),
                 split_pattern = '.')

Error output

{
    "name": "ValueError",
    "message": "Fragment file /Volumes/Seagate/SEA-AD/MTG/ATACseq/temp_files/ray_spill/1142534944/Astrocyte.fragments.tsv.gz does not exist.",
    "stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
      2                  variable = 'celltype',                                                                     # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
      3                  sample_id_col = 'sample_id',
      4                  chromsizes = chromsizes,
      5                  bed_path = os.path.join(save_dir, 'consensus_peak_calling/pseudobulk_bed_files/'),  # specify where pseudobulk_bed_files should be stored
      6                  bigwig_path = os.path.join(save_dir, 'consensus_peak_calling/pseudobulk_bw_files/'), # specify where pseudobulk_bw_files should be stored
      7                  path_to_fragments = fragments_dict,                                                        # location of fragment fiels
      8                  n_cpu = 4,                                                                                 # specify the number of cores to use, we use ray for multi processing
      9                  normalize_bigwig = True,
     10                  #remove_duplicates = True,
     11                  temp_dir = os.path.join(tmp_dir, 'ray_spill'),
     12                  split_pattern = '.')

File ~/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/pseudobulk_peak_calling.py:162, in export_pseudobulk(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, split_pattern, temp_dir)
    159 # For each sample, get fragments for each cell type
    161 log.info(\"Splitting fragments by cell type.\")
--> 162 split_fragment_files_by_cell_type(
    163     sample_to_fragment_file = path_to_fragments,
    164     path_to_temp_folder = temp_dir,
    165     path_to_output_folder = bed_path,
    166     sample_to_cell_type_to_cell_barcodes = sample_to_cell_type_to_barcodes,
    167     chromsizes = chromsizes_dict,
    168     n_cpu = n_cpu,
    169     verbose = False,
    170     clear_temp_folder = True
    171 )
    173 bed_paths = {}
    174 for cell_type in cell_data[variable].unique():

File ~/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scatac_fragment_tools/library/split/split_fragments_by_cell_type.py:92, in split_fragment_files_by_cell_type(sample_to_fragment_file, path_to_temp_folder, path_to_output_folder, sample_to_cell_type_to_cell_barcodes, chromsizes, n_cpu, verbose, clear_temp_folder)
     90 path_to_fragment_file = os.path.join(path_to_temp_folder, sample, f\"{cell_type_sanitized}.fragments.tsv.gz\")
     91 if not os.path.exists(path_to_fragment_file):
---> 92     raise ValueError(f\"Fragment file {path_to_fragment_file} does not exist.\")
     93 if cell_type_sanitized not in cell_type_to_fragment_files:
     94     cell_type_to_fragment_files[cell_type_sanitized] = []

ValueError: Fragment file /Volumes/Seagate/SEA-AD/MTG/ATACseq/temp_files/ray_spill/1142534944/Astrocyte.fragments.tsv.gz does not exist."
}

Expected behavior

I did some poking around and it seems that the code before the split_fragment_by_cell_type which I believe is writen in rust (?) is supposed to create the file but doesn't.

        print("Splitting fragments ...")
    joblib.Parallel(n_jobs=n_cpu)(
        joblib.delayed(_rust_scatac_fragment_tools.split_fragments_by_cell_barcode)
            (
                path_to_fragments = sample_to_fragment_file[sample],
                path_to_output_folder = os.path.join(path_to_temp_folder, sample),
                cell_type_to_cell_barcodes = sample_to_cell_type_to_cell_barcodes[sample],
                chromsizes = chromsizes,
                verbose = verbose
            )
            for sample in sample_to_cell_type_to_cell_barcodes
    )

Version:

TemiLeke commented 2 months ago

I fixed this already. I realized this was because the barcode in cell_data did not match those in the fragment file. So the sample_to_cell_type_to_cell_barcodes did not have any entries as a result.