aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
165 stars 27 forks source link

Error whilst Returning normalized TSS coverage matrix per barcode #264

Closed MozzyCow closed 6 months ago

MozzyCow commented 7 months ago

Hi,

Thanks for creating such an indepth analysis suite for multiome. I'm running into an issue with running compute_qc_stats some mouse multiome.

The pbmc tutorial datasets run correctly and pseduobulk, peak calling and get_consensus_peaks appear to run correctly. I had to modify the chromsizes with the addition of a .1 to the end of non-standard chromosomes to match the cell-arc format from the goldenpath chrom.sizes mm10 files to get export_pseudobulk to work.

I'm at a loss on how to solve it.

To Reproduce


if not os.path.exists(os.path.join(work_dir, 'scATAC/quality_control')):
    os.makedirs(os.path.join(work_dir, 'scATAC/quality_control'))

metadata_bc, profile_data_dict = compute_qc_stats(
                fragments_dict = fragments_dict,
                tss_annotation = annot,
                stats=['barcode_rank_plot', 'duplicate_rate', 'insert_size_distribution', 'profile_tss', 'frip'],
                label_list = None,
                path_to_regions = path_to_regions,
                n_cpu = 1,
                valid_bc = None,
                n_frag = 100,
                n_bc = None,
                tss_flank_window = 1000,
                tss_window = 50,
                tss_minimum_signal_window = 100,
                tss_rolling_window = 10,
                remove_duplicates = True,
                _temp_dir = os.path.join(tmp_dir + 'ray_spill'))

**Error output**
INFO:cisTopic:Reading DACE437PT
INFO:cisTopic:Computing barcode rank plot for DACE437PT
INFO:cisTopic:Counting fragments
INFO:cisTopic:Marking barcodes with more than 100
INFO:cisTopic:Returning plot data
INFO:cisTopic:Returning valid barcodes
INFO:cisTopic:Computing duplicate rate plot for DACE437PT
INFO:cisTopic:Return plot data
INFO:cisTopic:Computing insert size distribution for DACE437PT
INFO:cisTopic:Counting fragments
INFO:cisTopic:Returning plot data
INFO:cisTopic:Computing TSS profile for DACE437PT
INFO:cisTopic:Formatting annnotation
INFO:cisTopic:Creating coverage matrix
INFO:cisTopic:Coverage matrix done
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[51], line 4
      1 if not os.path.exists(os.path.join(work_dir, 'scATAC/quality_control')):
      2     os.makedirs(os.path.join(work_dir, 'scATAC/quality_control'))
----> 4 metadata_bc, profile_data_dict = compute_qc_stats(
      5                 fragments_dict = fragments_dict,
      6                 tss_annotation = annot,
      7                 stats=['barcode_rank_plot', 'duplicate_rate', 'insert_size_distribution', 'profile_tss', 'frip'],
      8                 label_list = None,
      9                 path_to_regions = path_to_regions,
     10                 n_cpu = 1,
     11                 valid_bc = None,
     12                 n_frag = 100,
     13                 n_bc = None,
     14                 tss_flank_window = 1000,
     15                 tss_window = 50,
     16                 tss_minimum_signal_window = 100,
     17                 tss_rolling_window = 10,
     18                 remove_duplicates = True,
     19                 _temp_dir = os.path.join(tmp_dir + 'ray_spill'))

File /data/lowe/dmh/miniconda3/envs/single_cell_perturb_v1/lib/python3.9/site-packages/pycisTopic/qc.py:1033, in compute_qc_stats(fragments_dict, tss_annotation, stats, label_list, path_to_regions, n_cpu, partition, valid_bc, n_frag, n_bc, tss_flank_window, tss_window, tss_minimum_signal_window, tss_rolling_window, min_norm, check_for_duplicates, remove_duplicates, use_polars, **kwargs)
   1031     ray.shutdown()
   1032 else:
-> 1033     qc_stats = [
   1034             compute_qc_stats_single(
   1035                 fragments_list[i],
   1036                 tss_annotation=tss_annotation,
   1037                 stats=stats,
   1038                 label=label_list[i],
   1039                 path_to_regions=path_to_regions[i],
   1040                 valid_bc=valid_bc,
   1041                 n_frag=n_frag,
   1042                 n_bc=n_bc,
   1043                 tss_flank_window=tss_flank_window,
   1044                 tss_window=tss_window,
   1045                 tss_minimum_signal_window=tss_minimum_signal_window,
   1046                 tss_rolling_window=tss_rolling_window,
   1047                 min_norm=min_norm,
   1048                 partition=partition,
   1049                 check_for_duplicates=check_for_duplicates,
   1050                 remove_duplicates=remove_duplicates,
   1051                 use_polars=use_polars,
   1052             )
   1053             for i in range(len(fragments_list))
   1054         ]
   1055 metadata_dict = {
   1056     key: x[key] for x in list(list(zip(*qc_stats))[0]) for key in x.keys()
   1057 }
   1058 profile_data_dict = {
   1059     key: x[key] for x in list(list(zip(*qc_stats))[1]) for key in x.keys()
   1060 }

File /data/lowe/dmh/miniconda3/envs/single_cell_perturb_v1/lib/python3.9/site-packages/pycisTopic/qc.py:1034, in <listcomp>(.0)
   1031     ray.shutdown()
   1032 else:
   1033     qc_stats = [
-> 1034             compute_qc_stats_single(
   1035                 fragments_list[i],
   1036                 tss_annotation=tss_annotation,
   1037                 stats=stats,
   1038                 label=label_list[i],
   1039                 path_to_regions=path_to_regions[i],
   1040                 valid_bc=valid_bc,
   1041                 n_frag=n_frag,
   1042                 n_bc=n_bc,
   1043                 tss_flank_window=tss_flank_window,
   1044                 tss_window=tss_window,
   1045                 tss_minimum_signal_window=tss_minimum_signal_window,
   1046                 tss_rolling_window=tss_rolling_window,
   1047                 min_norm=min_norm,
   1048                 partition=partition,
   1049                 check_for_duplicates=check_for_duplicates,
   1050                 remove_duplicates=remove_duplicates,
   1051                 use_polars=use_polars,
   1052             )
   1053             for i in range(len(fragments_list))
   1054         ]
   1055 metadata_dict = {
   1056     key: x[key] for x in list(list(zip(*qc_stats))[0]) for key in x.keys()
   1057 }
   1058 profile_data_dict = {
   1059     key: x[key] for x in list(list(zip(*qc_stats))[1]) for key in x.keys()
   1060 }

File /data/lowe/dmh/miniconda3/envs/single_cell_perturb_v1/lib/python3.9/site-packages/pycisTopic/qc.py:1217, in compute_qc_stats_single(fragments, tss_annotation, stats, label, path_to_regions, valid_bc, n_frag, n_bc, tss_flank_window, tss_window, tss_minimum_signal_window, tss_rolling_window, min_norm, partition, check_for_duplicates, remove_duplicates, use_polars)
   1214 if "profile_tss" in stats:
   1215     # TSS
   1216     log.info("Computing TSS profile for " + label)
-> 1217     profile_tss_metrics = profile_tss(
   1218         fragments=fragments_df,
   1219         annotation=tss_annotation,
   1220         valid_bc=valid_bc,
   1221         plot=False,
   1222         n_cpu=1,
   1223         partition=partition,
   1224         flank_window=tss_flank_window,
   1225         tss_window=tss_window,
   1226         minimum_signal_window=tss_minimum_signal_window,
   1227         rolling_window=tss_rolling_window,
   1228         min_norm=min_norm,
   1229         return_TSS_enrichment_per_barcode=True,
   1230         return_TSS_coverage_matrix_per_barcode=True,
   1231         return_plot_data=True,
   1232     )
   1233     if profile_tss_metrics is not None:
   1234         metrics["profile_tss"] = profile_tss_metrics

File /data/lowe/dmh/miniconda3/envs/single_cell_perturb_v1/lib/python3.9/site-packages/pycisTopic/qc.py:557, in profile_tss(fragments, annotation, valid_bc, plot, plot_data, n_cpu, partition, flank_window, tss_window, minimum_signal_window, rolling_window, min_norm, color, save, return_TSS_enrichment_per_barcode, return_TSS_coverage_matrix_per_barcode, return_plot_data, use_polars)
    555     TSS_matrix = get_tss_matrix(fragments, flank_window, tss_space_annotation)
    556 log.info("Coverage matrix done")
--> 557 if not TSS_matrix.columns.tolist() == list(range(2 * flank_window + 1)):
    558     missing_values = list(
    559         set(TSS_matrix.columns.tolist()).symmetric_difference(
    560             list(range(2 * flank_window + 1))
    561         )
    562     )
    563     for x in missing_values:

AttributeError: 'NoneType' object has no attribute 'columns'
MozzyCow commented 7 months ago

Running without profile_tss finishes without errors

MozzyCow commented 7 months ago

Upon further inspection/debugging, I had retained my non-standard chromosomes earlier in the analysis which was resulting in this line in get_tss_matrix returning nothing:

overlap_with_TSS = fragments.join(tss_space_annotation, nb_cpu=1).df if len(overlap_with_TSS) == 0: return

I'm wondering now if a better method is to remove all non-standard chromosome peaks from atac_fragments. Removing the #filter = annot['Chromosome/scaffold name'].str.contains('CHR|GL|JH|MT')

annot = annot[~filter] lines from the annotation dataframe preprocessing fixed the immediate issue

MozzyCow commented 7 months ago

Unfortunately there are still issues, not removing the those from annotations results in very low TSS enrichment, which doesn't match the cell arc output