Closed anan81 closed 2 years ago
Hi @anan81 !
Can you provide the exact command you are running? How does cell_data look like (can you post the head?)?
Cheers!
Carmen
Hi Carmen, this is command I used:
from pycisTopic.pseudobulk_peak_calling import export_pseudobulk
import ray
ray.shutdown()
sys.stderr = open(os.devnull, "w") # silence stderr
bw_paths, bed_paths = export_pseudobulk(input_data = cell_data_ATAC,
variable = 'cluster', # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype
sample_id_col = 'sample_id',
chromsizes = chromsizes,
bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'), # specify where pseudobulk_bed_files should be stored
bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),# specify where pseudobulk_bw_files should be stored
path_to_fragments = fragments_dict, # location of fragment files
n_cpu = 8, # specify the number of cores to use, we use ray for multi processing
normalize_bigwig = True,
remove_duplicates = True,
_temp_dir = os.path.join(tmp_dir, 'ray_spill'),
split_pattern = '-')
sys.stderr = sys.__stderr__ # unsilence stderr
And my cell data looks like this:
Hi @anan81 !
I think I see the problem. Can you rename the cell_id
column to barcode
? By default it will look for a column called barcode
, if it is not present it will take the index of the dataframe (which in your case is not set, and causes it to crash).
You can find further explanations on how it works here: https://pycistopic.readthedocs.io/en/latest/Single_sample_workflow-RTD.html
C
Thank you, Carmen. It worked after I renamed cell_id column to barcode.
Hello, I got the following error when running export_pseudobulk() to generate pseudobulk ATAC-seq profiles. My data is seperate scRNA-seq and scATAC-seq from different cells but the same sample. All cell types have been annotated. Data types of "sample_id" and "cluster" columns are string. Do you have any idea to solve that issue? Many thanks in advance.
TypeError Traceback (most recent call last) Input In [39], in <cell line: 5>() 3 ray.shutdown() 4 sys.stderr = open(os.devnull, "w") # silence stderr ----> 5 bw_paths, bed_paths = export_pseudobulk(input_data = cell_data_ATAC, 6 variable = 'cluster', # variable by which to generate pseubulk profiles, in this case we want pseudobulks per celltype 7 sample_id_col = 'sample_id', 8 chromsizes = chromsizes, 9 bed_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bed_files/'), # specify where pseudobulk_bed_files should be stored 10 bigwig_path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/pseudobulk_bw_files/'),# specify where pseudobulk_bw_files should be stored 11 path_to_fragments = fragments_dict, # location of fragment files 12 n_cpu = 8, # specify the number of cores to use, we use ray for multi processing 13 normalize_bigwig = True, 14 remove_duplicates = True, 15 _temp_dir = os.path.join(tmp_dir, 'ray_spill'), 16 split_pattern = '-') 17 sys.stderr = sys.stderr
File /vsc-hard-mounts/leuven-data/328/vsc32848/miniconda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/pseudobulk_peak_calling.py:128, in export_pseudobulk(input_data, variable, chromsizes, bed_path, bigwig_path, path_to_fragments, sample_id_col, n_cpu, normalize_bigwig, remove_duplicates, split_pattern, use_polars, **kwargs) 122 fragments_df = fragments_df.loc[ 123 fragments_df["Name"].isin(cell_data["barcode"].tolist()) 124 ] 125 else: 126 fragments_df = fragments_df.loc[ 127 fragments_df["Name"].isin( --> 128 prepare_tag_cells(cell_data.index.tolist(), split_pattern) 129 ) 130 ] 131 fragments_df_dict[sample_id] = fragments_df 133 # Set groups
File /vsc-hard-mounts/leuven-data/328/vsc32848/miniconda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/utils.py:183, in prepare_tag_cells(cell_names, split_pattern) 181 def prepare_tag_cells(cell_names, split_pattern="___"): 182 if split_pattern == "-": --> 183 new_cell_names = [ 184 re.findall(r"^[ACGT]-[0-9]+-", x)[0].rstrip("-") 185 if len(re.findall(r"^[ACGT]-[0-9]+-", x)) != 0 186 else x 187 for x in cell_names 188 ] 189 new_cell_names = [ 190 re.findall(r"^\w-[0-9]", new_cell_names[i])[0].rstrip("-") 191 if (len(re.findall(r"^\w-[0-9]", new_cell_names[i])) != 0) (...) 194 for i in range(len(new_cell_names)) 195 ] 196 else:
File /vsc-hard-mounts/leuven-data/328/vsc32848/miniconda/envs/scenicplus/lib/python3.8/site-packages/pycisTopic/utils.py:185, in(.0)
181 def prepare_tag_cells(cell_names, split_pattern="___"):
182 if split_pattern == "-":
183 new_cell_names = [
184 re.findall(r"^[ACGT]-[0-9]+-", x)[0].rstrip("-")
--> 185 if len(re.findall(r"^[ACGT]-[0-9]+-", x)) != 0
186 else x
187 for x in cell_names
188 ]
189 new_cell_names = [
190 re.findall(r"^\w-[0-9]", new_cell_names[i])[0].rstrip("-")
191 if (len(re.findall(r"^\w-[0-9]", new_cell_names[i])) != 0)
(...)
194 for i in range(len(new_cell_names))
195 ]
196 else:
File /vsc-hard-mounts/leuven-data/328/vsc32848/miniconda/envs/scenicplus/lib/python3.8/re.py:241, in findall(pattern, string, flags) 233 def findall(pattern, string, flags=0): 234 """Return a list of all non-overlapping matches in the string. 235 236 If one or more capturing groups are present in the pattern, return (...) 239 240 Empty matches are included in the result.""" --> 241 return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object