Closed sarmapar closed 4 years ago
Hello, thanks for writing in.
It would be useful to understand why this is occurring. Could you check the following:
These checks will let us know whether the discrepancy is due to peak calling, read counting or something Hi-C related. I suspect it could be related to 0 vs NaN in Hi-C matrices, but we should rule out everything else first.
The recalculated Neighborhoods/EnhancerList.txt has 230 more lines than the example, but the ones that are in common seem to have the same activity_base values.
The example and recalculated hic_contact_pl_scaled_adj values are identical, which can be seen in the following scatter plot.
I also just found the following warnings after the macs2 callpeak output,
ABC-Enhancer-Gene-Prediction-master/src/hic.py:136: RuntimeWarning: invalid value encountered in less
norms[norms < kr_cutoff] = np.nan
ABC-Enhancer-Gene-Prediction-master/src/hic.py:137: RuntimeWarning: invalid value encountered in greater_equal
norms[norms >= kr_cutoff] = 1
which seem to have to do with numpy trying to deal with NaN values.
I've attached the recalculated EnhancerList.txt if that is helpful. EnhancerListRecalculated.txt
Thank you for this analysis and the file. The difference in number of rows of EnhancerList.txt and the equivalence of the Hi-C data suggest this issue is not related to Hi-C.
I realize now that the script I used to generate the results in example_chr22/ had a minor difference to the example commands in the README. I will fix this for the next release of the codebase. Making this change explains the vast majority of the discrepancy between the file in the repository and EnhancerListRecalculated.txt.
It's not exactly the same though. There are still a small set of (~20) enhancer regions with slightly different bounds. I'd still like to understand these discrepancies, will require some further digging - I will get back to you.
I was unable to completely reproduce EnhancerListRecalculated.txt, even after using the version of MACS2 that you used.
I have however fixed the issue whereby the example data in example_chr22/ and the example commands on the README were slightly out of sync.
Hello, I am attempting to replicate the results using the chromosome 22 example to verify that the code is working properly before applying it to my own data. I am running the code on a linux-based cluster with slightly different versions of the necessary packages. I am using anaconda to load the necessary packages, and I am using the following versions:
I am not getting any warnings or errors, and I am getting results very similar to the results in the example folder, but my re-calculated results are consistently lower than the example which results in less enhancer-gene pairs, since the cutoff is 0.2. I have included a scatterplot of the ABC scores from the enhancer-gene pairs that both the example and my results have in common, and the IGV visualization of these results below. The mean absolute difference between each ABC score pair was about 0.0026.
Here is a visual comparison, looking at the EnhancerPreductions.bedpe output from the example_chr22/ABC_output/Predictions included here and the EnhancerPreductions.bedpe output from running the code with the exact specifications as in the README file.
Since the results are well correlated, I don't think this is a big issue, but I am not sure why these differences are occurring. It could be some rounding differences on the cluster, or some differences in the versions of the packages I am using, or the fact that I am running the packages through anaconda. Any thoughts?
Best regards