SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
162
stars
27
forks
source link
export_pseudobulk error: fragment file does not exist #360
When I try to run the export_pseudobulk function, I get the following error ValueError: Fragment file /Volumes/Seagate/SEA-AD/MTG/ATACseq/temp_files/ray_spill/1142534944/Astrocyte.fragments.tsv.gz does not exist." }
To Reproduce
bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
variable = 'celltype', # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
sample_id_col = 'sample_id',
chromsizes = chromsizes,
bed_path = os.path.join(save_dir, 'consensus_peak_calling/pseudobulk_bed_files/'), # specify where pseudobulk_bed_files should be stored
bigwig_path = os.path.join(save_dir, 'consensus_peak_calling/pseudobulk_bw_files/'), # specify where pseudobulk_bw_files should be stored
path_to_fragments = fragments_dict, # location of fragment fiels
n_cpu = 4, # specify the number of cores to use, we use ray for multi processing
normalize_bigwig = True,
#remove_duplicates = True,
temp_dir = os.path.join(tmp_dir, 'ray_spill'),
split_pattern = '.')
Error output
{
"name": "ValueError",
"message": "Fragment file /Volumes/Seagate/SEA-AD/MTG/ATACseq/temp_files/ray_spill/1142534944/Astrocyte.fragments.tsv.gz does not exist.",
"stack": "---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[6], line 1
----> 1 bw_paths, bed_paths = export_pseudobulk(input_data = cell_data,
2 variable = 'celltype', # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
3 sample_id_col = 'sample_id',
4 chromsizes = chromsizes,
5 bed_path = os.path.join(save_dir, 'consensus_peak_calling/pseudobulk_bed_files/'), # specify where pseudobulk_bed_files should be stored
6 bigwig_path = os.path.join(save_dir, 'consensus_peak_calling/pseudobulk_bw_files/'), # specify where pseudobulk_bw_files should be stored
7 path_to_fragments = fragments_dict, # location of fragment fiels
8 n_cpu = 4, # specify the number of cores to use, we use ray for multi processing
9 normalize_bigwig = True,
10 #remove_duplicates = True,
11 temp_dir = os.path.join(tmp_dir, 'ray_spill'),
12 split_pattern = '.')
File ~/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/pseudobulk_peak_calling.py:162, in export_pseudobulk(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, split_pattern, temp_dir)
159 # For each sample, get fragments for each cell type
161 log.info(\"Splitting fragments by cell type.\")
--> 162 split_fragment_files_by_cell_type(
163 sample_to_fragment_file = path_to_fragments,
164 path_to_temp_folder = temp_dir,
165 path_to_output_folder = bed_path,
166 sample_to_cell_type_to_cell_barcodes = sample_to_cell_type_to_barcodes,
167 chromsizes = chromsizes_dict,
168 n_cpu = n_cpu,
169 verbose = False,
170 clear_temp_folder = True
171 )
173 bed_paths = {}
174 for cell_type in cell_data[variable].unique():
File ~/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scatac_fragment_tools/library/split/split_fragments_by_cell_type.py:92, in split_fragment_files_by_cell_type(sample_to_fragment_file, path_to_temp_folder, path_to_output_folder, sample_to_cell_type_to_cell_barcodes, chromsizes, n_cpu, verbose, clear_temp_folder)
90 path_to_fragment_file = os.path.join(path_to_temp_folder, sample, f\"{cell_type_sanitized}.fragments.tsv.gz\")
91 if not os.path.exists(path_to_fragment_file):
---> 92 raise ValueError(f\"Fragment file {path_to_fragment_file} does not exist.\")
93 if cell_type_sanitized not in cell_type_to_fragment_files:
94 cell_type_to_fragment_files[cell_type_sanitized] = []
ValueError: Fragment file /Volumes/Seagate/SEA-AD/MTG/ATACseq/temp_files/ray_spill/1142534944/Astrocyte.fragments.tsv.gz does not exist."
}
Expected behavior
I did some poking around and it seems that the code before the split_fragment_by_cell_type which I believe is writen in rust (?) is supposed to create the file but doesn't.
I fixed this already. I realized this was because the barcode in cell_data did not match those in the fragment file. So the sample_to_cell_type_to_cell_barcodes did not have any entries as a result.
Describe the bug Hello team,
When I try to run the
export_pseudobulk
function, I get the following errorValueError: Fragment file /Volumes/Seagate/SEA-AD/MTG/ATACseq/temp_files/ray_spill/1142534944/Astrocyte.fragments.tsv.gz does not exist." }
To Reproduce
Error output
Expected behavior
I did some poking around and it seems that the code before the
split_fragment_by_cell_type
which I believe is writen in rust (?) is supposed to create the file but doesn't.Version: