Closed BlackPianoCat closed 2 years ago
So I have a
.bedpe
file whose head looks like that. I created the code so as to include also orientation and have the last three columns, but unfortunately it still does not work for me. The columns are separated with\t
(as it is needed).chr1 869398 870595 chr1 904618 906401 5 . + - chr1 869398 870595 chr1 937699 942959 13 . + - chr1 869398 870595 chr1 979636 987730 2 . + + chr1 869398 870595 chr1 1001366 1003470 5 . + - chr1 869398 870595 chr1 1058440 1061403 2 . + + chr1 869398 870595 chr1 1118816 1123474 2 . + + chr1 869398 870595 chr1 1250309 1252884 2 . + - chr1 869398 870595 chr1 1290219 1292623 2 . + - chr1 904618 906401 chr1 914193 915144 5 . + +
and the command that I am trying to run is something like,
cLoops -f GM12878WT_ChIAPET_SMC1A_B1S4B2S2B3S2_2.bedpe.gz -o cLoops_out -minPts 20,30 -eps 2500,5000,7500,10000 -hic -s -j -c chr21
as you propose in documentation. My purpose is to call cLoops, so as to find stripes (after it). The error that I take is,
2022-02-09 18:05:06,608 INFO Command line: cLoops -f GM12878WT_ChIAPET_SMC1A_B1S4B2S2B3S2_2.bedpe.gz -o cLoops_out -m 0 -eps 2500,5000,7500,10000 -minPts 20,30 -p 1 -w False -j True -s True -c chr21 -hic True -cut 0 -plot False -max_cut False 2022-02-09 18:05:06,632 INFO mode:0 eps:[2500, 5000, 7500, 10000] minPts:[30, 20] hic:True 2022-02-09 18:05:06,632 INFO Parsing PETs from GM12878WT_ChIAPET_SMC1A_B1S4B2S2B3S2_2.bedpe.gz, requiring initial distance cutoff > 0 300000 PETs processed from GM12878WT_ChIAPET_SMC1A_B1S4B2S2B3S2_2.bedpe.gz() 2022-02-09 18:05:07,933 INFO Totaly 333808 PETs from GM12878WT_ChIAPET_SMC1A_B1S4B2S2B3S2_2.bedpe.gz, in which 3535 cis PETs Clustering chr21 and chr21 using eps as 2500, minPts as 30,pre-set distance cutoff as > 0 Clustering chr21 and chr21 finished. Estimated 0 self-ligation reads and 0 inter-ligation reads 2022-02-09 18:05:07,960 INFO ERROR: no inter-ligation PETs detected for eps 2500 minPts 30,can't model the distance cutoff,continue anyway Clustering chr21 and chr21 using eps as 2500, minPts as 20,pre-set distance cutoff as > 0 Clustering chr21 and chr21 finished. Estimated 0 self-ligation reads and 0 inter-ligation reads 2022-02-09 18:05:07,978 INFO ERROR: no inter-ligation PETs detected for eps 2500 minPts 20,can't model the distance cutoff,continue anyway Clustering chr21 and chr21 using eps as 5000, minPts as 30,pre-set distance cutoff as > 0 Clustering chr21 and chr21 finished. Estimated 0 self-ligation reads and 0 inter-ligation reads 2022-02-09 18:05:07,996 INFO ERROR: no inter-ligation PETs detected for eps 5000 minPts 30,can't model the distance cutoff,continue anyway Clustering chr21 and chr21 using eps as 5000, minPts as 20,pre-set distance cutoff as > 0 Clustering chr21 and chr21 finished. Estimated 0 self-ligation reads and 0 inter-ligation reads 2022-02-09 18:05:08,015 INFO ERROR: no inter-ligation PETs detected for eps 5000 minPts 20,can't model the distance cutoff,continue anyway Clustering chr21 and chr21 using eps as 7500, minPts as 30,pre-set distance cutoff as > 0 Clustering chr21 and chr21 finished. Estimated 0 self-ligation reads and 0 inter-ligation reads 2022-02-09 18:05:08,034 INFO ERROR: no inter-ligation PETs detected for eps 7500 minPts 30,can't model the distance cutoff,continue anyway Clustering chr21 and chr21 using eps as 7500, minPts as 20,pre-set distance cutoff as > 0 Clustering chr21 and chr21 finished. Estimated 0 self-ligation reads and 0 inter-ligation reads 2022-02-09 18:05:08,052 INFO ERROR: no inter-ligation PETs detected for eps 7500 minPts 20,can't model the distance cutoff,continue anyway Clustering chr21 and chr21 using eps as 10000, minPts as 30,pre-set distance cutoff as > 0 Clustering chr21 and chr21 finished. Estimated 0 self-ligation reads and 0 inter-ligation reads 2022-02-09 18:05:08,070 INFO ERROR: no inter-ligation PETs detected for eps 10000 minPts 30,can't model the distance cutoff,continue anyway Clustering chr21 and chr21 using eps as 10000, minPts as 20,pre-set distance cutoff as > 0 Clustering chr21 and chr21 finished. Estimated 0 self-ligation reads and 0 inter-ligation reads 2022-02-09 18:05:08,089 INFO ERROR: no inter-ligation PETs detected for eps 10000 minPts 20,can't model the distance cutoff,continue anyway Traceback (most recent call last): File "/home/blackpianocat/anaconda3/envs/cLoops/bin/cLoops", line 11, in <module> load_entry_point('cLoops==0.93', 'console_scripts', 'cLoops')() File "build/bdist.linux-x86_64/egg/cLoops/pipe.py", line 349, in main File "build/bdist.linux-x86_64/egg/cLoops/pipe.py", line 280, in pipe File "/home/blackpianocat/anaconda3/envs/cLoops/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2618, in amin initial=initial) File "/home/blackpianocat/anaconda3/envs/cLoops/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ValueError: zero-size array to reduction operation minimum which has no identity
I also tried your new script cLoops2, but I still have problems since after the preprocessing it gives me empty files.
Hi Dear User, Could you please share a small chromosome such as chr21 that I can have a close check? Please share to caoyaqiang0410@gmail.com It seems your data is ChIA-PET data, I could suggest run with -eps 1000 -minPts 10 for a initial trial. Best, Yaqiang
Good morning,
Thank you for your fast answer. I sent you the email with the data, and I also checked to run it with the parameters you proposed me, but I still have the same error.
Good morning,
Thank you for your fast answer. I sent you the email with the data, and I also checked to run it with the parameters you proposed me, but I still have the same error.
Hi, The file can be processed by cLoops2 pre. I tried to convert it through cLoops2 dump -washU, and it seems hard to observe loops in the genome browser due to too few PETs. To my knowledge, ideally for ChIA-PET data, there should be more than 20 million PETs. Not sure how many you have and if the library passed the quality control. Best, Yaqiang
So to create this file I did some filtering to find the CTCF motifs orientation, this script keeps only the lines that it is able to find these motifs and discards all the other ones. So probably I must change some parameter of my script to have a more detailed file. Thank you!
Good morning. Unfortunately, I did not succeed to resolve my issue. I used a scrip so as to find the CTCF motifs and complete the columns with + and -, however I am not sure if it works and if it is a correct procedure (I am still new in bioinformatics).
The other thing that I tried was to use your hicpropairs2bedpe.py
script which is supposed to convert a .hic
file to .bedpe
. So I started from I .hic
file and I still have the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 40: invalid start byte
Therefore, if it is easy for you, I would like ask you one simple question: with what input data your algorithm works. With what kind data should I start and what kind of preprocessing should I do?
Good morning. Unfortunately, I did not succeed to resolve my issue. I used a scrip so as to find the CTCF motifs and complete the columns with + and -, however I am not sure if it works and if it is a correct procedure (I am still new in bioinformatics).
The other thing that I tried was to use your
hicpropairs2bedpe.py
script which is supposed to convert a.hic
file to.bedpe
. So I started from I.hic
file and I still have the following error:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 40: invalid start byte
Therefore, if it is easy for you, I would like ask you one simple question: with what input data your algorithm works. With what kind data should I start and what kind of preprocessing should I do?
hicpropairs2bedpe.py was used to convert .allValidPairs file to .BEDPE file. HIC file is not supported.
Good morning. Unfortunately, I did not succeed to resolve my issue. I used a scrip so as to find the CTCF motifs and complete the columns with + and -, however I am not sure if it works and if it is a correct procedure (I am still new in bioinformatics). The other thing that I tried was to use your
hicpropairs2bedpe.py
script which is supposed to convert a.hic
file to.bedpe
. So I started from I.hic
file and I still have the following error:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 40: invalid start byte
Therefore, if it is easy for you, I would like ask you one simple question: with what input data your algorithm works. With what kind data should I start and what kind of preprocessing should I do?
hicpropairs2bedpe.py was used to convert .allValidPairs file to .BEDPE file. HIC file is not supported.
For Hi-C data, HiCPro for preprocessing, hicpropairs2bedpe.py to BEDPE file as input of cLoops2 . For ChIA-PET data, processed PETs into BEDPE, with the preprocessing tools of mango or ChIA-PET Tools. Also , a test data actually provided at https://github.com/YaqiangCao/cLoops/tree/master/examples.
Yes, I have checked the test file, thank you for your information. Finally, I proceed by converting ChIA-PET to BEDPE with straw. It works but I still see a lot of false positives in loops and I cannot detect stripes. I believe that this is related to the tuning of parameters. I will check also your the software you proposed me for the preprocessing. Thank you again!
So I have a
.bedpe
file whose head looks like that. I created the code so as to include also orientation and have the last three columns, but unfortunately it still does not work for me. The columns are separated with\t
(as it is needed).and the command that I am trying to run is something like,
as you propose in documentation. My purpose is to call cLoops, so as to find stripes (after it). The error that I take is,
I also tried your new script cLoops2, but I still have problems since after the preprocessing it gives me empty files.