aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
178 stars 28 forks source link

AssertionError: The dataframe does not have all the columns Chromosome, Start and End. #209

Open crossworlds423 opened 1 year ago

crossworlds423 commented 1 year ago

I am currently trying to replicate the steps presented in the Tutorial: 10x multiome pbmc

However, I ran into an AssertionError: The dataframe does not have all the columns Chromosome, Start and End

Here is the output shown in the terminal:

(scenicplus) userLab@instance-4:~$ python example.py
... storing 'feature_types' as categorical
... storing 'genome' as categorical
... storing 'celltype' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
2023-08-21 07:37:53,258 cisTopic     INFO     Reading fragments from pbmc_tutorial/data/pbmc_granulocyte_sorted_3k_atac_fragments.tsv.gz
2023-08-21 07:38:19,934 cisTopic     INFO     Creating pseudobulk for B_cells
2023-08-21 07:38:25,408 cisTopic     INFO     B_cells done!
2023-08-21 07:38:25,408 cisTopic     INFO     Creating pseudobulk for B_cells_1
2023-08-21 07:38:56,783 cisTopic     INFO     B_cells_1 done!
2023-08-21 07:38:56,784 cisTopic     INFO     Creating pseudobulk for B_cells_2
2023-08-21 07:39:07,328 cisTopic     INFO     B_cells_2 done!
2023-08-21 07:39:07,328 cisTopic     INFO     Creating pseudobulk for CD14_Monocytes
2023-08-21 07:41:20,535 cisTopic     INFO     CD14_Monocytes done!
2023-08-21 07:41:20,536 cisTopic     INFO     Creating pseudobulk for CD4_T_cells
2023-08-21 07:44:00,022 cisTopic     INFO     CD4_T_cells done!
2023-08-21 07:44:00,023 cisTopic     INFO     Creating pseudobulk for CD8_T_cells
2023-08-21 07:44:28,040 cisTopic     INFO     CD8_T_cells done!
2023-08-21 07:44:28,042 cisTopic     INFO     Creating pseudobulk for FCGR3A_Monocytes
2023-08-21 07:44:42,860 cisTopic     INFO     FCGR3A_Monocytes done!
2023-08-21 07:44:42,860 cisTopic     INFO     Creating pseudobulk for NK_cells
2023-08-21 07:44:52,445 cisTopic     INFO     NK_cells done!
2023-08-21 07:44:52,448 cisTopic     INFO     Calling peaks for B_cells with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/B_cells.bed.gz --name B_cells  --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:45:00,660 cisTopic     INFO     B_cells done!
2023-08-21 07:45:00,660 cisTopic     INFO     Calling peaks for B_cells_1 with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/B_cells_1.bed.gz --name B_cells_1  --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:45:22,829 cisTopic     INFO     B_cells_1 done!
2023-08-21 07:45:22,829 cisTopic     INFO     Calling peaks for B_cells_2 with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/B_cells_2.bed.gz --name B_cells_2  --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:45:38,648 cisTopic     INFO     B_cells_2 done!
2023-08-21 07:45:38,648 cisTopic     INFO     Calling peaks for CD14_Monocytes with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/CD14_Monocytes.bed.gz --name CD14_Monocytes  --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:46:34,801 cisTopic     INFO     CD14_Monocytes done!
2023-08-21 07:46:34,801 cisTopic     INFO     Calling peaks for CD4_T_cells with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/CD4_T_cells.bed.gz --name CD4_T_cells  --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:47:32,765 cisTopic     INFO     CD4_T_cells done!
2023-08-21 07:47:32,765 cisTopic     INFO     Calling peaks for CD8_T_cells with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/CD8_T_cells.bed.gz --name CD8_T_cells  --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:47:51,510 cisTopic     INFO     CD8_T_cells done!
2023-08-21 07:47:51,510 cisTopic     INFO     Calling peaks for FCGR3A_Monocytes with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/FCGR3A_Monocytes.bed.gz --name FCGR3A_Monocytes  --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:48:12,298 cisTopic     INFO     FCGR3A_Monocytes done!
2023-08-21 07:48:12,298 cisTopic     INFO     Calling peaks for NK_cells with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/NK_cells.bed.gz --name NK_cells  --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:48:24,263 cisTopic     INFO     NK_cells done!
2023-08-21 07:48:24,475 cisTopic     INFO     Extending and merging peaks per class
Traceback (most recent call last):
  File "example.py", line 310, in <module>
    consensus_peaks=get_consensus_peaks(narrow_peaks_dict, peak_half_width, chromsizes=chromsizes, path_to_blacklist=path_to_blacklist)    
  File "/home/userLab/.local/lib/python3.8/site-packages/pycisTopic/iterative_peak_calling.py", line 65, in get_consensus_peaks
    center_extended_peaks = [
  File "/home/userLab/.local/lib/python3.8/site-packages/pycisTopic/iterative_peak_calling.py", line 67, in <listcomp>
    calculate_peaks_and_extend(
  File "/home/userLab/.local/lib/python3.8/site-packages/pycisTopic/iterative_peak_calling.py", line 142, in calculate_peaks_and_extend
    blacklist = pr.read_bed(path_to_blacklist)
  File "/home/userLab/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pyranges/readers.py", line 98, in read_bed
    return PyRanges(df)
  File "/home/userLab/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pyranges/pyranges_main.py", line 178, in __init__
    _init(self, df, chromosomes, starts, ends, strands, copy_df)
  File "/home/userLab/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pyranges/methods/init.py", line 120, in _init
    assert all(
AssertionError: The dataframe does not have all the columns Chromosome, Start and End.

I believe the error is associated with the type of ### hg38-blacklist.v2.bed i downloaded. I downloaded the file from here: https://github.com/igordot/reference-genomes/blob/master/hg38/blacklist.v2.bed

I don't know if downloaded the right version of the hg38-blacklist.v2.bed file because I couldn't find it on the scenic website. All the other files referenced in the tutorial were downloaded. I am using the data files as described in the tutorial, not my own data.

Here is the line of code for line 310 the error is mentioning:

# wget -O hg38-blacklist.v2.bed https://github.com/igordot/reference-genomes/blob/master/hg38/blacklist.v2.bed

path_to_blacklist= '/home/userLab/pycisTopic/blacklist/hg38-blacklist.v2.bed'
# Get consensus peaks
consensus_peaks=get_consensus_peaks(narrow_peaks_dict, peak_half_width, chromsizes=chromsizes, path_to_blacklist=path_to_blacklist)    

consensus_peaks.to_bed(
    path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/consensus_regions.bed'),
    keep=True,
    compression='infer',
    chain=False)

Does anyone know how to address this error? Where did people get the blacklist.v2.bed file?

SeppeDeWinter commented 1 year ago

Hi @crossworlds423

We host the blacklist regions that were used for the tutorial on the pycisTopic repo: https://github.com/aertslab/pycisTopic/blob/master/blacklist/hg38-blacklist.v2.bed

Let me know if this fixes your issue.

Best,

Seppe