Closed ChenFengling closed 6 years ago
Are you using the 40Kb resolution to generate the result? There is a "count_filter" option in the "run_HiCtrans.pl". This filter will remove the breakpoints below a threshold (generally a global inter-chromosomal mean count). You can change that based on your data and resolution.
Also, can you please attach the translocation result file. I will have a look at the data.
In general, a translocation can have multiple breakpoints but not 942. There is some filtering issue here.
Just an update, I ran HiCtrans on T47D (https://www.encodeproject.org/experiments/ENCSR549MGQ/) chr1-chr3 file and it showed no translocation event within the pair.
I use the same dataset but the preprocessing step is different as I generate the contact map using .hic file and reformat the spare matrix. I run HiCtrans in 40kb contact maps in T47D and the results is attached. I find the filter value is 10 and the average inter-chromosomal mean count is 1~2. All.chromosome.Translocation.zip
I guess when you are creating the hic file you are using the juicer "Pre" command. Can you check if it is normalizing the counts by some means before creating the hic file? HiCtrans expects raw contact counts.
In the paper, we have used HiC-Pro pipeline processed data. Although the normalized counts may vary between the pipelines the raw counts should not differ. In the HiC-Pro processed file, the highest contact count between chr1-chr3 pair is 7 while as per your attached file there is an interaction that has a count value of 60.
I dump raw data from .hic file and I check the validpairs file and found out the 60 contacts between "chr3 144600000 144640000" and "chr1 207520000 207560000" . I don't why but using different processing step even drives the raw data different.
I check chr1_ch3 40kb matrix and find 274 cell has >3 contact. These cells are sparsely distributed on the map. Let's consider it as noise. I find out the HiCtrans is very sensitive to the "local noise" with high contact. Here I plot the translocation output and cell with contact >3.
Altough that must have something wrong in the preprocessing step, the method is better to avoid these false positives.
I just checked that ~23 out of 25 translocations that we report in the paper are also present in the list. Most of the regions which didn't have any count in the HiC-Pro processed file are showing very high counts here (> 30/40). With a background level of ~2 counts, it is no surprise that it is picking that up.
If possible, I would request you to please upload this chr1-chr3 file here. I will have a look at it in detail.
Thanks for your kindly and quick response! Attached is my file. chr1-chr3.zip
I fould these "noise" was caused from multiple mapping events. When I use juicer, it did't filter some multiple mapping events which results in the extremely high contacts in some cells of the map. I will fix this issue. However I still think HiCtrans should have some statistical test to avoid these false positives as translocation are shown as a whole local pattern difference not a single cell difference.
Thanks for the file and figuring out the mapping problem, it is an extremely important step to consider. There is a changepoint statistical test in the translocationFind.r script to avoid such instances, but what we found out that some translocations are made up of multiple breakpoints of different sizes (e.g. NCIH460 translocation in the paper), and by applying too strict condition we tend to lose such known translocations. But I agree this is a concern and thanks for raising it. I will have a look at it and try to make it an optimal one.
Fyi, I have updated the package so as to avoid the noises.
I use Hictrans to call translocation from T47D used in your bioinformatic paper, but the results show many false positives. check following results. The translocation are detected on the first two right spot which means my input is correct
but the the detected translocation also present at more false place!
In fact, it is not the single case, I detected "942" translocation event in T47D instead of 25 site reported in your paper! It is confusing.