YaqiangCao / cLoops

Accurate and flexible loops calling tool for 3D genomic data.
https://yaqiangcao.github.io/cLoops/
MIT License
109 stars 17 forks source link

adjust parameters to increase/decrease stringency #13

Closed rikrdo89 closed 5 years ago

rikrdo89 commented 5 years ago

Hi, I am using cLoops to call loops on my HiC datasets (>200M valid pairs/sample), and I am trying to tune the parameters to incrase and or decrease the number of called loops. Could you explain how to use eps and minPts to increase the sensitivity of the calls? Also I noticed that the majority of the calls are less of a given distance apart... is there a way to incrase the distance to detect longer-range interactions?

I have used -m 3 (equals -eps 5000,7500,10000 -minPts 20,30,40,50 -hic) but I am not sure how this works. Thank you.

YaqiangCao commented 5 years ago

Hi, I am using cLoops to call loops on my HiC datasets (>200M valid pairs/sample), and I am trying to tune the parameters to incrase and or decrease the number of called loops. Could you explain how to use eps and minPts to increase the sensitivity of the calls? Also I noticed that the majority of the calls are less of a given distance apart... is there a way to incrase the distance to detect longer-range interactions?

I have used -m 3 (equals -eps 5000,7500,10000 -minPts 20,30,40,50 -hic) but I am not sure how this works. Thank you.

Dear user,

Thanks for your interest in cLoops. Considering your questions following are my suggestions.



  1. Could you explain how to use eps and minPts to increase the sensitivity of the calls? More eps and more minPts assigned to cLoops, the calls will be more sensitivity, however, it also depends on your reliable data, more parameters assigned, more time consuming. More details please check our manuscript.

  1. Also I noticed that the majority of the calls are less of a given distance apart... is there a way to incrase the distance to detect longer-range interactions?
    Sure , use the option of -cut , for example -cut 1000, only distance > 1kb PETs will be used to call loops. Also, the -max_cut option could filter close candidate loops based on your eps and minPts.

  1. I have used -m 3 (equals -eps 5000,7500,10000 -minPts 20,30,40,50 -hic) but I am not sure how this works.
    For your data, as you mentioned, > 200M valid pairs, however, how many of them are intra-chromosomal PETs? I could suggest -eps 5000,10000 -minPts 20 for a first trial of chr21, like following:
cLoops -f test.bedpe.gz -c chr21 -p 1 -eps 5000,10000 -minPts 20 -o test -j -s

With the tmp directory test generated by cLoops, you can use jd2juice to generate a .hic file and then load into Juicebox, together with the called loops with suffix of _juicebox.txt, you can check the result visually. If too many loops seems false, increase minPts, if no obvious peaks detected, decrease minPts. If majory of loops visually captured by cLoops, but some are missed, add more eps, such as -eps 2500, 5000, 7500 10000 or -eps 2000,4000,6000,8000,10000 . You can always tune your parameters using the small chromosomal such as chr21, and then use the parameters for all data.



Please migrate this issue to cLoops issue page, I will make another answer there if other users are interesting in this question. Happy loop-calling. If you find cLoops useful, please give us a star at the cLoops github repo and cite our paper. Best, Yaqiang

rikrdo89 commented 5 years ago

Thank you Yaqiang for this throughout response. I have run a few analysis, and the algorithm performs quite well.

I would want to understand a bit more how the eps parameter affects loop calls. I understand that the eps , as you define it,is the distance for two PETs to be considered neighbors. If the value of this parameter is large (say 10000), does this mean that region that contain these peaks/loops is also larger? (i.e. a large square at the apex of a TAD) and if it is low (say 2500), the area surrounding the peak interacting PETs is smaller?

and for calculating the significance test, are the permutations done around the peak, or within the peak region?

YaqiangCao commented 5 years ago

Thank you Yaqiang for this throughout response. I have run a few analysis, and the algorithm performs quite well.

I would want to understand a bit more how the eps parameter affects loop calls. I understand that the eps , as you define it,is the distance for two PETs to be considered neighbors. If the value of this parameter is large (say 10000), does this mean that region that contain these peaks/loops is also larger? (i.e. a large square at the apex of a TAD) and if it is low (say 2500), the area surrounding the peak interacting PETs is smaller?

and for calculating the significance test, are the permutations done around the peak, or within the peak region?

Yes, for larger eps, the final loop anchors will be larger.

The permutations were done around the loops.

YaqiangCao commented 5 years ago

Thank you Yaqiang for this throughout response. I have run a few analysis, and the algorithm performs quite well.

I would want to understand a bit more how the eps parameter affects loop calls. I understand that the eps , as you define it,is the distance for two PETs to be considered neighbors. If the value of this parameter is large (say 10000), does this mean that region that contain these peaks/loops is also larger? (i.e. a large square at the apex of a TAD) and if it is low (say 2500), the area surrounding the peak interacting PETs is smaller?

and for calculating the significance test, are the permutations done around the peak, or within the peak region?

Please give us a star at the github repo and cite our cLoops paper if it helps. Thanks!

rikrdo89 commented 5 years ago

thank you for your explanation. Yes of course I will cite your paper. This tool is extremely helpful.