Dear developer,
I like your pipeline very much! It is easy to use and take care of removing noise/poor region and so on.
I just have some questions that I am not sure.
Do you know if pileup will faster than step-by-step if I can run on a cluster ?
I saw 4DN standard's description about replications(biology and sequencing), it said first merge all sequencing replications(one library but for some reason sequenced in different lane), removing PCR duplication then merge different biology replications. Dose your pipeline also think sequencing replications and biology replications different? or same? I know it is 99% same(if I have two sequencing replications and first separately remove PCR duplication then merge together, one loci at most has two PCR duplications and it really doesn't matter.) But I just want to know that it is matter for sequencing replications I write R1,R2,R3 or R1,R1,R1 in the datasets.tsv.
I know this is not the important thing and will not influence results~
Looking forward to your reply ~
Because the pileup subcommand uses exactly the same underlying functions as the step-by-step commands, I don't think it can run faster.
This is a very good point that I didn't explain well in the tutorial. Actually, before version 0.8.4, runHiC uses the same strategy suggested by 4DN, i.e., merging sequencing replicates before removing PCR duplication. However, after a tradeoff between speed and precision, I finally decided to use the other strategy (removing PCR duplicates before merging) in 0.8.4 and later versions. As you mentioned (and according to my test), the results will be largely similar, while the former one runs much slower for large datasets (such as Rao 2014, GM12878). Therefore, the answer is: for versions before 0.8.4, it does make a difference if you write "R1,R2,R3" instead of "R1,R1,R1"; starting from 0.8.4, they are same in terms of removing PCR duplications.
Dear developer, I like your pipeline very much! It is easy to use and take care of removing noise/poor region and so on. I just have some questions that I am not sure.
I know this is not the important thing and will not influence results~ Looking forward to your reply ~
Shu