LegubeDNAREPAIR / compD_pipeline_python

0 stars 0 forks source link

Identification of D compartment in response to neuronal activation. #1

Open mujahida87 opened 6 months ago

mujahida87 commented 6 months ago

Hi Vincent, We like to test if neuronal activation in mice (biqucucline treatment) caused the D-compartmentalization. We liked your analysis of annotating the D compartment and followed your method as below

  1. For this, we extracted a 100kb resolution matrix (data depth 600million read for each condition) for both the control and BIC-treated sample (from the juicer .hic matrix) using the hicExplorer function hicConvertFormat
  2. then transformed in observed/expected using hicTransform function, 3. hicCompareMatrices using log2ratio operation of hicExplorer
  3. then weight balance using a cooler balance
  4. lastly calculate PC1 and PC2 using cool-tools eigs-cis --n-eigs 2 the outcome of cooltool: I only got 269 out of 27469 100kb windows with PC values, rest of the windows are 'nan' or 0.00 using differential Atac-seq signal (BIC differential peaks), we made Pearson's correlation between PC1 (even PC2 of log2 (OE-BIC-100Kb.cool/OE-ctrl-100Kb.cool)) and correlation value is 0.23.
    correlation_scatterplot.pdf Screenshot 2024-05-06 at 15 35 55 Screenshot 2024-05-06 at 15 44 54 Somehow I could not use snakefile and custom code for PC1 calculation due to the missing of hicexplorer.yaml file in our centralized cluster but I used hicexplorer and cooltools for the above steps. Screenshot 2024-05-06 at 15 50 54

Question: Is it normal to be turned out of almost all the differential Hi-C matrix windows are empty (except 269/27469) or I am making a mistake here? in your analysis, Extended Data Fig. 5ab, shows continuous PC values, in our case, we got too few windows that show PC values even considering two completely different datasets https://www.nature.com/articles/s41586-023-06635-y/figures/10 Lastly, Can I ask how many D-compartments you have detected at 100Kb resolution? My apologies if it is not directly related to your pipeline or area of story and please let me know if my given information are not properly understandable.

Thank you

rochevin commented 5 months ago

Hello, You balanced your matrix after doing the ratio. For me, you shouldn't, since the matrices are already balanced when you dump them in the hic (you need to ask for it when you use hicConvertFormat).

It's weird that cool-tools eigs-cis yield no values at some position, yet I dunno exactly what they does with their code when the extract the PC. Note that from what I remember, they use sklearn to compute PC, as I do in my script. In my case, I had continuous PC values.

I did similar correlation for atac-seq to get the proper orientation of my PC1 signal, so it should be fine.

If you want to be able to use the environment described in hicexplorer.yaml, you need to specify --use-conda in your snakemake parameters, see documentation here.

I'm sorry that my pipeline is poorly documented, If i have time, i'll do a README which explain how to launch the pipeline with snakemake. Maybe try again with specifying --use-conda as I suggested.

Let me know if it worked, or maybe just try to launch my python script instead of cool-tools eigs-cis to see if it yield similar results.

Bests, Vincent

mujahida87 commented 5 months ago

Thank you very much for your reply and suggestions. Yes, it's a bit weird cooltools don't compute the PC except few of the windows. The other problem is cooltool doesn't run without balancing the matrix. I am trying to run your python. Let see Thanks again