aertslab / pycisTopic

pycisTopic is a Python module to simultaneously identify cell states and cis-regulatory topics from single cell epigenomics data.
Other
58 stars 12 forks source link

export_pseudobulk not saving bed/bw files, and not printing "done!" message #103

Closed AmosFong1 closed 5 months ago

AmosFong1 commented 10 months ago

I am running export pseudobulk with a cell_data pandas dataframe, that contains my variable for pseudobulk grouping, a barcode column corresponding to my fragment files, a sample_id_col corresponding to my fragment dictionary. The function runs with no problems, but I don't have any saved bw or bed files. I have read a few similar issues suggesting the barcode syntax is different from fragment files but I don't think this is the case because I can find intersections. Can you suggest a fix?

Below is my code:

bw_paths, bed_paths = export_pseudobulk(
    input_data=cell_data,
    variable="DHITsig_Call",
    sample_id_col="orig.ident",
    chromsizes=chromsizes,
    bed_path=os.path.join(project_dir, "data/pycisTopic/pseudobulk_bed_files/"),
    bigwig_path=os.path.join(project_dir, "data/pycisTopic/pseudobulk_bw_files/"),
    path_to_fragments=fragments_dict,
    n_cpu=20,
    normalize_bigwig=True,
    remove_duplicates=True,
    _temp_dir=temp_dir,
    split_pattern="___",
    use_polars=True
)
# create fragments dictionary
fragments_dict = {
    "CLC03313": project_dir + "data/pycisTopic/atac_fragments_CLC03313.tsv.gz",
    "CLC03314": project_dir + "data/pycisTopic/atac_fragments_CLC03314.tsv.gz",
    "CLC03315": project_dir + "data/pycisTopic/atac_fragments_CLC03315.tsv.gz",
    "CLC03316": project_dir + "data/pycisTopic/atac_fragments_CLC03316.tsv.gz",
    "CLC03328": project_dir + "data/pycisTopic/atac_fragments_CLC03328.tsv.gz",
    "CLC03329": project_dir + "data/pycisTopic/atac_fragments_CLC03329.tsv.gz",
    "CLC03330": project_dir + "data/pycisTopic/atac_fragments_CLC03330.tsv.gz",
    "CLC03331": project_dir + "data/pycisTopic/atac_fragments_CLC03331.tsv.gz",
    "CLC03447": project_dir + "data/pycisTopic/atac_fragments_CLC03447.tsv.gz",
    "CLC03448": project_dir + "data/pycisTopic/atac_fragments_CLC03448.tsv.gz",
    "CLC03449": project_dir + "data/pycisTopic/atac_fragments_CLC03449.tsv.gz",
    "CLC03450": project_dir + "data/pycisTopic/atac_fragments_CLC03450.tsv.gz",
    "CLC03459": project_dir + "data/pycisTopic/atac_fragments_CLC03459.tsv.gz",
    "CLC03460": project_dir + "data/pycisTopic/atac_fragments_CLC03460.tsv.gz",
    "CLC03461": project_dir + "data/pycisTopic/atac_fragments_CLC03461.tsv.gz",
    "CLC03462": project_dir + "data/pycisTopic/atac_fragments_CLC03462.tsv.gz",
    "CLC03471": project_dir + "data/pycisTopic/atac_fragments_CLC03471.tsv.gz",
    "CLC03472": project_dir + "data/pycisTopic/atac_fragments_CLC03472.tsv.gz",
    "CLC03473": project_dir + "data/pycisTopic/atac_fragments_CLC03473.tsv.gz",
    "CLC03474": project_dir + "data/pycisTopic/atac_fragments_CLC03474.tsv.gz"
}

example of fragment file: Screenshot 2024-01-13 at 11 50 33 PM

example of cell_data columns for sample id (orig.ident) and barcode: Screenshot 2024-01-13 at 11 52 31 PM

Below is my Jupyter notebook log:

2024-01-13 23:21:27,591 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03313.tsv.gz
2024-01-13 23:22:07,609 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03314.tsv.gz
2024-01-13 23:23:09,939 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03315.tsv.gz
2024-01-13 23:23:56,134 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03316.tsv.gz
2024-01-13 23:24:56,093 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03328.tsv.gz
2024-01-13 23:25:26,070 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03329.tsv.gz
2024-01-13 23:26:06,107 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03330.tsv.gz
2024-01-13 23:27:14,705 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03331.tsv.gz
2024-01-13 23:28:20,442 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03447.tsv.gz
2024-01-13 23:28:50,178 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03448.tsv.gz
2024-01-13 23:29:04,993 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03449.tsv.gz
2024-01-13 23:29:40,523 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03450.tsv.gz
2024-01-13 23:29:40,798 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03459.tsv.gz
2024-01-13 23:29:49,566 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03460.tsv.gz
2024-01-13 23:30:48,727 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03461.tsv.gz
2024-01-13 23:30:52,853 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03462.tsv.gz
2024-01-13 23:30:52,983 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03471.tsv.gz
2024-01-13 23:31:02,134 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03472.tsv.gz
2024-01-13 23:31:56,579 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03473.tsv.gz
2024-01-13 23:32:02,520 cisTopic     INFO     Reading fragments from /projects/dscott_prj/amfong/Multiome_DZ/data/pycisTopic/atac_fragments_CLC03474.tsv.gz
2024-01-13 23:33:04,924 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
(export_pseudobulk_ray pid=26703) 2024-01-13 23:33:29,265 cisTopic     INFO     Creating pseudobulk for NEG
(export_pseudobulk_ray pid=26688) 2024-01-13 23:33:29,283 cisTopic     INFO     Creating pseudobulk for POS
(export_pseudobulk_ray pid=26701) 2024-01-13 23:33:33,689 cisTopic     INFO     Creating pseudobulk for UNCLASS
(export_pseudobulk_ray pid=26701) /home/amfong/pycisTopic/pycisTopic/pseudobulk_peak_calling.py:274: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
(export_pseudobulk_ray pid=26701)   group_fragments = group_fragments_list[0].append(group_fragments_list[1:])
(export_pseudobulk_ray pid=26703) /home/amfong/pycisTopic/pycisTopic/pseudobulk_peak_calling.py:274: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
(export_pseudobulk_ray pid=26703)   group_fragments = group_fragments_list[0].append(group_fragments_list[1:]) [repeated 2x across cluster]
SeppeDeWinter commented 10 months ago

Hi @AmosFong1

I just updated this function, see https://github.com/aertslab/pycisTopic/commit/1afbd1d71dd9caf2f8f53d4c752240089b182bc9.

Does this fix your issue?

All the best,

Seppe.