LHXqwq commented 8 months ago

Traceback (most recent call last): File "predict_eval.py", line 7, in from loop_calling import GnnLoopCaller File "/home/lihaoxing/scGSLoop/loop_calling.py", line 8, in from torch_geometric.nn import VGAE File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/init.py", line 13, in import torch_geometric.datasets File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/datasets/init.py", line 101, in from .explainer_dataset import ExplainerDataset File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/datasets/explainer_dataset.py", line 9, in from torch_geometric.explain import Explanation File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/explain/init.py", line 3, in from .algorithm import * # noqa File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/explain/algorithm/init.py", line 1, in from .base import ExplainerAlgorithm File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/explain/algorithm/base.py", line 14, in from torch_geometric.nn import MessagePassing File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/nn/init.py", line 2, in from .sequential import Sequential File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/nn/sequential.py", line 9, in from torch_geometric.template import module_from_template File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/template.py", line 7, in from jinja2 import Environment, FileSystemLoader File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/jinja2/init.py", line 12, in from .environment import Environment File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/jinja2/environment.py", line 25, in from .defaults import BLOCK_END_STRING File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/jinja2/defaults.py", line 3, in from .filters import FILTERS as DEFAULT_FILTERS # noqa: F401 File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/jinja2/filters.py", line 13, in from markupsafe import soft_unicode ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/markupsafe/init.py)

The error occurred when I ran python predict_eval.py. I have installed version 2.1.5 of markupsafe.

LHXqwq commented 8 months ago

python -m pip install markupsafe==2.0.1 python -m pip install werkzeug==2.0.2

After installing the old versions of these two modules, python predict_eval.py can run normally.

However, when running the second step, python hub_discover.py, a new issue arises: what file should be provided for 'consensus_path: Path to the consensus loop list', and how to obtain it?

wangfuzhou110 commented 8 months ago

Hi @LHXqwq consensus_path refers to the path of the file that was generated in the previous step "Consensus loop list". Have a look here. Please feel free to ask if you have any other questions.

LHXqwq commented 8 months ago

I have successfully run the entire workflow in my environment, thank you very much. I have one more question: if I want to obtain loop structures from my own data, such as pig single-cell Hi-C data, which model should I use, or do I need to train one myself? Is there a requirement for the number of cells and the amount of contacts per cell?

wangfuzhou110 commented 8 months ago

Hi @LHXqwq , thank you for using our software. You don't need to train a new model. You can choose either model (i.e., mES_k3_GNNFINE or hpc_k3_GNNFINE) for you own data. We have tested our models in a cross-species configuration, so using it on a dataset originating from pig should not be a problem. In our experiments, we have successfully predicted on the datasets with as few as 10 cells. If your dataset has a number of cells smaller than 10, the model is still suitable for use, but I would recommend setting IMPUTE to False in the configs file.

LHXqwq commented 8 months ago

The information is very helpful. I am currently preparing my own data as input, and I would like to know if there are any tools available to quickly obtain 'xxx.10kb.kmer.csv' files for other species. I'm not entirely sure what the numbers corresponding to several base pairs in the example data such as mm10.10kb.kmer.csv mean.

wangfuzhou110 commented 8 months ago

That's a good point. The script for this purpose is feature_engineering.py. It, however, is not easy to use. I will upload a handy script for feature generation this weekend. Before this, you can have a look at feature_engineering.py and see if you can do it by yourself. Basically, the function create_kmer_input_file generates xxx.10kb.kmer.csv, and create_motif_input_file creates CTCF_xxx.10kb.input.csv. Here are the explanations to the signatures of the two functions:

def create_motif_input_file(chrom_size_path, motif_bed_path, out_csv_path, chroms, resolution)

chrom_size_path: the path to the chromosome size file, e.g., mm9.chrom.sizes. For pigs, I believe it should be something like susScr11.chrom.sizes.

motif_bed_path: the result of a FIMO scan. The threshold of the FIMO scan needs to be set as 1e-6. Use FIMO to scan the genome assembly with JASPAR motif profile MA0139.1. After obtaining the FIMO .tsv result, make the format look like this:

seqnames    start   end width   strand  name    score   pvalue  qvalue  sequence
chr1    4516733 4516751 19  +   MA0139.1    20.541  7.16e-08    0.0655  TTGCCAATAGGTGGCGCTA
chr1    4770056 4770074 19  +   MA0139.1    25.2787 4e-10   0.0164  TGGCCACCAGGGGGCAGTC
chr1    5323801 5323819 19  +   MA0139.1    17.2295 7.83e-07    0.183   AGGCCACCAGGGGTCAGCT
chr1    6083634 6083652 19  +   MA0139.1    19.3934 1.75e-07    0.0922  agaccagaagagggcacca

For more information on this format, see here.

out_csv_path: the output path where you want to save CTCF_xxx.10kb.input.csv.

chroms: A list containing the chromosome names that you want to detect loops on.

resolution: Always set to 10000

def create_kmer_input_file(chrom_size_path, assembly_path, out_csv_path, chroms, resolution)

assembly_path: The path to the genome assembly, e.g., susScr11.fa

out_csv_path: the output path where you want to save xxx.10kb.kmer.csv.

Other arguments are the same.

wangfuzhou110 commented 8 months ago

Hi @LHXqwq I have uploaded a new version of feature_engineering.py to create features for other species. Type python feature_engineering.py -h to see the usage. Let me know if there are any problems when you run the program.

LHXqwq commented 8 months ago

Thank you very much for your help! I have successfully obtained pig.10kb.kmer.csv and CTCF_pig.10kb.input.csv using the feature_engineering.py script.

I have another question about merging multiple single-cell cool files into a scool file. I have two types of cells, A cells with 50 files and B cells also with 50 files. Should I merge all 100 cells into one scool file, or should I get two scool files separately and perform loop calling on each? What are the theoretical differences between these two approaches? What is your suggestion?

wangfuzhou110 commented 8 months ago

No problem! @LHXqwq Both approaches you mentioned are ok, but remember to make sure that at the consensus stage the predictions of different cell types are stored in separatedirectories. The theoretical difference lies at the data enhancement step of scGSLoop. If the input file contains multiple cell types, then the algorithm may use the information across different cell types to enhance the data. Therefore, if your purpose is to compare the loops between cell types, I would suggest you do not merge them together.

LHXqwq commented 8 months ago

Processing...

0%| | 0/20 [00:00<?, ?it/s]/home/lihaoxing/scGSLoop/nn_data.py:231: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:275.) graph = Data(x=None, num_nodes=mat.shape[0], edge_index=torch.tensor([mat.row, mat.col], dtype=torch.long))

5%|▌ | 1/20 [00:00<00:15, 1.23it/s] 10%|█ | 2/20 [00:01<00:14, 1.21it/s] 15%|█▌ | 3/20 [00:02<00:12, 1.36it/s] 20%|██ | 4/20 [00:02<00:10, 1.46it/s] 25%|██▌ | 5/20 [00:03<00:11, 1.30it/s] 30%|███ | 6/20 [00:04<00:10, 1.33it/s] 35%|███▌ | 7/20 [00:05<00:08, 1.46it/s] 40%|████ | 8/20 [00:05<00:07, 1.53it/s] 45%|████▌ | 9/20 [00:06<00:06, 1.73it/s] 50%|█████ | 10/20 [00:06<00:05, 1.81it/s] 55%|█████▌ | 11/20 [00:07<00:05, 1.71it/s] 60%|██████ | 12/20 [00:07<00:04, 1.86it/s] 65%|██████▌ | 13/20 [00:08<00:03, 1.93it/s] 70%|███████ | 14/20 [00:08<00:03, 1.92it/s] 75%|███████▌ | 15/20 [00:09<00:02, 1.86it/s] 80%|████████ | 16/20 [00:09<00:01, 2.03it/s] 85%|████████▌ | 17/20 [00:10<00:01, 1.93it/s] 90%|█████████ | 18/20 [00:10<00:01, 1.82it/s] 95%|█████████▌| 19/20 [00:11<00:00, 1.46it/s] 100%|██████████| 20/20 [00:12<00:00, 1.55it/s] 100%|██████████| 20/20 [00:12<00:00, 1.62it/s] Done! Imputing...

0%| | 0/20 [00:00<?, ?it/s] 5%|▌ | 1/20 [00:12<03:53, 12.30s/it] 10%|█ | 2/20 [00:27<04:10, 13.92s/it] 15%|█▌ | 3/20 [00:39<03:40, 12.99s/it] 20%|██ | 4/20 [00:52<03:30, 13.15s/it] 25%|██▌ | 5/20 [01:04<03:09, 12.66s/it] 30%|███ | 6/20 [01:17<02:56, 12.63s/it] 35%|███▌ | 7/20 [01:27<02:37, 12.10s/it] 40%|████ | 8/20 [01:40<02:25, 12.10s/it] 45%|████▌ | 9/20 [01:50<02:06, 11.52s/it] 50%|█████ | 10/20 [02:02<01:56, 11.60s/it] 55%|█████▌ | 11/20 [02:16<01:53, 12.58s/it] 60%|██████ | 12/20 [02:27<01:35, 11.98s/it] 65%|██████▌ | 13/20 [02:39<01:23, 11.91s/it] 70%|███████ | 14/20 [02:51<01:11, 11.97s/it] 75%|███████▌ | 15/20 [03:03<01:00, 12.13s/it] 80%|████████ | 16/20 [03:14<00:46, 11.54s/it] 85%|████████▌ | 17/20 [03:27<00:36, 12.10s/it] 90%|█████████ | 18/20 [03:40<00:25, 12.52s/it] 95%|█████████▌| 19/20 [03:55<00:13, 13.20s/it] 100%|██████████| 20/20 [04:06<00:00, 12.60s/it] 100%|██████████| 20/20 [04:06<00:00, 12.35s/it] Done Creating .scool from imputed coolers...

0%| | 0/20 [00:00<?, ?it/s] 5%|▌ | 1/20 [00:01<00:20, 1.08s/it] 10%|█ | 2/20 [00:02<00:18, 1.03s/it] 15%|█▌ | 3/20 [00:02<00:14, 1.18it/s] 20%|██ | 4/20 [00:04<00:17, 1.08s/it] 25%|██▌ | 5/20 [00:05<00:16, 1.11s/it] 30%|███ | 6/20 [00:06<00:14, 1.05s/it] 35%|███▌ | 7/20 [00:07<00:12, 1.02it/s] 40%|████ | 8/20 [00:07<00:10, 1.18it/s] 45%|████▌ | 9/20 [00:09<00:11, 1.03s/it] 50%|█████ | 10/20 [00:09<00:09, 1.06it/s] 55%|█████▌ | 11/20 [00:10<00:08, 1.10it/s] 60%|██████ | 12/20 [00:11<00:07, 1.01it/s] 65%|██████▌ | 13/20 [00:12<00:06, 1.02it/s] 70%|███████ | 14/20 [00:13<00:06, 1.04s/it] 75%|███████▌ | 15/20 [00:15<00:05, 1.15s/it] 80%|████████ | 16/20 [00:16<00:04, 1.07s/it] 85%|████████▌ | 17/20 [00:16<00:02, 1.05it/s] 90%|█████████ | 18/20 [00:17<00:01, 1.08it/s] 95%|█████████▌| 19/20 [00:18<00:00, 1.11it/s] 100%|██████████| 20/20 [00:19<00:00, 1.25it/s] 100%|██████████| 20/20 [00:19<00:00, 1.04it/s] Coarsening data...

0%| | 0/20 [00:00<?, ?it/s] 5%|▌ | 1/20 [00:01<00:23, 1.22s/it] 10%|█ | 2/20 [00:02<00:27, 1.54s/it] 15%|█▌ | 3/20 [00:03<00:21, 1.27s/it] 20%|██ | 4/20 [00:05<00:21, 1.32s/it] 25%|██▌ | 5/20 [00:06<00:17, 1.18s/it] 30%|███ | 6/20 [00:07<00:16, 1.19s/it] 35%|███▌ | 7/20 [00:08<00:13, 1.04s/it] 40%|████ | 8/20 [00:09<00:12, 1.05s/it] 45%|████▌ | 9/20 [00:09<00:09, 1.12it/s] 50%|█████ | 10/20 [00:10<00:09, 1.10it/s] 55%|█████▌ | 11/20 [00:12<00:10, 1.17s/it] 60%|██████ | 12/20 [00:13<00:08, 1.01s/it] 65%|██████▌ | 13/20 [00:14<00:07, 1.03s/it] 70%|███████ | 14/20 [00:15<00:06, 1.05s/it] 75%|███████▌ | 15/20 [00:16<00:05, 1.10s/it] 80%|████████ | 16/20 [00:17<00:03, 1.08it/s] 85%|████████▌ | 17/20 [00:18<00:03, 1.07s/it] 90%|█████████ | 18/20 [00:19<00:02, 1.17s/it] 95%|█████████▌| 19/20 [00:21<00:01, 1.35s/it] 100%|██████████| 20/20 [00:22<00:00, 1.17s/it] 100%|██████████| 20/20 [00:22<00:00, 1.12s/it]

0%| | 0/20 [00:00<?, ?it/s] 5%|▌ | 1/20 [00:00<00:04, 3.92it/s] 10%|█ | 2/20 [00:00<00:07, 2.36it/s] 15%|█▌ | 3/20 [00:01<00:08, 2.07it/s] 20%|██ | 4/20 [00:02<00:08, 1.79it/s] 25%|██▌ | 5/20 [00:02<00:08, 1.74it/s] 30%|███ | 6/20 [00:02<00:06, 2.05it/s] 35%|███▌ | 7/20 [00:03<00:05, 2.33it/s] 40%|████ | 8/20 [00:03<00:05, 2.29it/s] 45%|████▌ | 9/20 [00:04<00:04, 2.52it/s] 50%|█████ | 10/20 [00:04<00:04, 2.43it/s] 55%|█████▌ | 11/20 [00:04<00:03, 2.37it/s] 60%|██████ | 12/20 [00:05<00:02, 2.68it/s] 65%|██████▌ | 13/20 [00:05<00:02, 3.09it/s] 70%|███████ | 14/20 [00:05<00:02, 2.86it/s] 75%|███████▌ | 15/20 [00:06<00:01, 3.13it/s] 80%|████████ | 16/20 [00:06<00:01, 2.92it/s] 85%|████████▌ | 17/20 [00:06<00:01, 2.45it/s] 90%|█████████ | 18/20 [00:07<00:00, 2.47it/s] 95%|█████████▌| 19/20 [00:07<00:00, 2.76it/s] 100%|██████████| 20/20 [00:08<00:00, 2.58it/s] 100%|██████████| 20/20 [00:08<00:00, 2.47it/s] sys:1: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False. sys:1: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False. Processing... Done! Done! Traceback (most recent call last): File "predict_eval.py", line 194, in predict_on_other_dataset( File "predict_eval.py", line 59, in predict_on_other_dataset gnn_caller = GnnLoopCaller(training_run_id, chroms, gnn_path, graph_dataset.num_features) File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 149, in num_features return self.num_node_features File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 134, in num_node_features data = self[0] File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 292, in getitem data = self.get(self.indices()[idx]) File "/home/lihaoxing/scGSLoop/nn_data.py", line 328, in get data = self._process_item(idx) File "/home/lihaoxing/scGSLoop/nn_data.py", line 314, in _process_item graph = self.pre_transform(graph) File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/transforms/base_transform.py", line 32, in call return self.forward(copy.copy(data)) File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 24, in forward data = transform(data) File "/home/lihaoxing/scGSLoop/nn_data.py", line 58, in call assert len(current_df) == data.num_nodes AssertionError

Sorry to bother you again. When I run python predict_eval.py on my own dataset, I encounter the above error. Could you please tell me what could be the possible reasons?

wangfuzhou110 commented 8 months ago

I'm happy to help. @LHXqwq It seems like an unmatch between the input files. Could you have a look at the chromosome list of your cooler files (Use this API), as well as the chromosome names in the assembly.sizes file? I guess the chromosome names in the cooler files might be something like '1', '2', '3'..., while those in the assembly size file might be 'chr1', 'chr2', 'chr3'...

If this is the case, please rename the chromosomes in the cooler files using this function and re-create the .scool files. Also make sure that the chromosome names in configs.py start with chr. In this way, the chromosome names in different entries are made consistent.

Note that this is only one possible reason that could cause the problem. If this is not the case, please feel free to post here (with a little more information) and I will have a further look.

LHXqwq commented 8 months ago

bins_selector = cooler.Cooler("/home/lihaoxing/scGSLoop/pig_project/data/scool/pig_10k_MII.scool::/cells/MII-1.cool")

print(bins_selector.bins()[:10])

chrom start end 0 1 0 10000 1 1 10000 20000 2 1 20000 30000 3 1 30000 40000 4 1 40000 50000 5 1 50000 60000 6 1 60000 70000 7 1 70000 80000 8 1 80000 90000 9 1 90000 100000

head CTCF_pig.10kb.input.csv

chrom start end pos_count neg_count 1 0 10000 0.0 0.0 1 10000 20000 0.0 0.0 1 20000 30000 0.0 0.0 1 30000 40000 0.0 0.0 1 40000 50000 0.0 0.0 1 50000 60000 0.0 1.0 1 60000 70000 0.0 0.0 1 70000 80000 0.0 0.0 1 80000 90000 0.0 0.0

awk '{print$1,$2,$3}' pig.10kb.kmer.csv|head

chrom start end 1 0 10000 1 10000 20000 1 20000 30000 1 30000 40000 1 40000 50000 1 50000 60000 1 60000 70000 1 70000 80000 1 80000 90000

cat Sus_scrofa.chrom.size

1 274330532 2 151935994 3 132848913 4 130910915 5 104526007 6 170843587 7 121844099 8 138966237 9 139512083 10 69359453 11 79169978 12 61602749 13 208334590 14 141755446 15 140412725 16 79944280 17 63494081 18 55982971 X 125939595 Y 43547828 MT 16613

configs.py

CHROMOSOMES = [str(i) for i in range(1, 19)]

I have examined these files, and it seems that their chromosome names consist only of numeric digits without 'chr', which appears to be matching. Are there any other potential areas that might have been overlooked for modification, or should I consider adding 'chr' to all of them?

wangfuzhou110 commented 8 months ago

@LHXqwq I have uploaded a new version that supports numeric chromosome names. Please pull the repo and try it!

LHXqwq commented 8 months ago

Thank you very much for your help. I have run the complete process on my own dataset and got normal results.

There are a few minor issues. I solved them myself while running the process so I didn't bother you. If you have time, you may consider whether it is necessary to perform some optimizations.

1.predict_eval.py, line 182, the path of data/placeholder.bedpe needs to be modified.

My running environment is on the CPU. I have made slight modifications in the following two places to ensure that the program will not report errors. nn_data.py: line 256: "data = torch.load(path.join(self.processed_dir, '{}.{}.pt'.format(cell_name.split('/')[-1], chrom_name)))" is replaced by "data = torch.load(path.join(self.processed_dir, '{}.{}.pt'.format(cell_name.split('/')[-1], chrom_name)), map_location =torch.device('cpu'))" train_utils.py: line 94: "checkpoint = torch.load(path)" is replaced by "checkpoint = torch.load(path, map_location=torch.device('cpu'))"
Numeric chromosome names need to be modified in the following two places when performing consensus.py predict_eval.py, line 90, I add "chrom=str(chrom)" to ensure that "mat_shape = get_bin_count(chrom_sizes[chrom], res)" can read the correct chromosome number predict_eval.py, line 92, I newly added int(), otherwise the chromosome number will report an error. "chrom_df = df[df['chrom1'] == int(chrom)]"

wangfuzhou110 commented 8 months ago

@LHXqwq Great! I will look into these issues and release an improved version later. We have also tested the program on CPU but did not encounter the problem you spotted. If it isn't too much trouble, could you please provide me with the detailed error information of your run? Thank you very much.

LHXqwq commented 8 months ago

Processing... 0%| | 0/20 [00:00<?, ?it/s]/home/lihaoxing/scGSLoop/nn_data.py:233: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:275.) graph = Data(x=None, num_nodes=mat.shape[0], edge_index=torch.tensor([mat.row, mat.col], dtype=torch.long)) 100%|██████████| 20/20 [00:11<00:00, 1.70it/s] Done! Traceback (most recent call last): File "predict_eval.py", line 200, in imputer.load_model() File "/home/lihaoxing/scGSLoop/imputation.py", line 144, in loadmodel self.model, self.optimizer, * = load_model(self.model, self.optimizer, self.model_path) File "/home/lihaoxing/scGSLoop/train_utils.py", line 95, in load_model checkpoint = torch.load(path) File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch/serialization.py", line 1026, in load return _load(opened_zipfile, File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch/serialization.py", line 1438, in _load result = unpickler.load() File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch/serialization.py", line 1408, in persistent_load typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location)) File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch/serialization.py", line 1382, in load_tensor wrap_storage=restore_location(storage, location), File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch/serialization.py", line 391, in default_restore_location result = fn(storage, location) File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch/serialization.py", line 266, in _cuda_deserialize device = validate_cuda_device(location) File "/home/lihaoxing/miniconda3/envs/scloop/lib/python3.8/site-packages/torch/serialization.py", line 250, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on a CUDA ' RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

When I ran the predict_eval.py script with the original code and the above error occurred, so I added "map_location=torch.device('cpu')"

LHXqwq commented 6 months ago

Sorry to bother you, but for the loop results obtained from individual cells, each loop has a probability value. I want to obtain some loops with higher credibility as much as possible. Do you have any suggestions for the threshold?

wangfuzhou110 commented 6 months ago

Hi, sorry for this delayed replay. Do you mean that you want to obtain single-cell loops with higher probability? If that's the case, take the loops with probability larger than a fixed threshold using Pandas would be fine.

fzbio / scGSLoop

ImportError: cannot import name 'soft_unicode' from 'markupsafe' #1

bins_selector = cooler.Cooler("/home/lihaoxing/scGSLoop/pig_project/data/scool/pig_10k_MII.scool::/cells/MII-1.cool")

print(bins_selector.bins()[:10])

head CTCF_pig.10kb.input.csv

awk '{print$1,$2,$3}' pig.10kb.kmer.csv|head

cat Sus_scrofa.chrom.size

configs.py