Tian-Dechao / diffDomain

DiffDomain is a statistically sound method for detecting differential TADs between conditions
MIT License
14 stars 4 forks source link

Question about example 2 #21

Open Yecats77 opened 6 months ago

Yecats77 commented 6 months ago

Hi,

I tried your example 2. The commands and output results(adjusted_TADs2.txt_types.txt) are attached.

# ========= Example2: Testing TADs on all chromosomes ============= 

# <bed> data/GSE63525_GM12878_primary+replicate_Arrowhead_domainlist.txt
# <ofile: output file> temp/temp.txt
python3 ${diffdomain_py3}diffdomains.py dvsd multiple ${hic0} ${hic1} ${diffdomain_home}data/GSE63525_GM12878_primary+replicate_Arrowhead_domainlist.txt --ofile ${res_path}temp/temp.txt

# <input> temp/temp.txt
# <output> adjusted_TADs2.txt 
python ${diffdomain_py3}diffdomains.py adjustment fdr_bh ${res_path}temp/temp.txt ${res_path}adjusted_TADs2.txt 
# --filter true adj_pvalue<=0.05
# The final output is saved to <diff_res/reorganized_TADs_GM12878_K562.tsv>
python ${diffdomain_py3}diffdomains.py adjustment fdr_bh ${res_path}temp/temp.txt ${res_path}reorganized_TADs_GM12878_K562.tsv --filter true

cd ${res_path}
# -d diffdoamin's outcome
# -t the other tadlist
# output diffDomain/diff_res/{}_types.txt
python ${diffdomain_py3}classification.py -d ${res_path}adjusted_TADs2.txt  -t ${diffdomain_home}data/GSE63525_K562_Arrowhead_domainlist.txt
python ${diffdomain_py3}classification.py -d ${res_path}reorganized_TADs_GM12878_K562.tsv  -t ${diffdomain_home}data/GSE63525_K562_Arrowhead_domainlist.txt

In the result, it is observed that, for some regions, there are only condition 1 entries whose type is loss; while for other regions, the type of consition 2 are always nan. I am not sure what happend. May I ask whether I made any mistakes when using the diffDomain tools? Could you please help with this problem?

Thank you.

Best regards, Stacey LIU

image adjusted_TADs2.txt adjusted_TADs2.txt_types.txt

Tian-Dechao commented 6 months ago

Hey

Your run on the example is correct. The subtype column was meant to further classify the type column. Missing value nan in the subtype column represents that the type results cannot be further classified into more than one subgroups, as in the case for the reorganized TADs that are loss in condition 2.

Yecats77 commented 6 months ago

Thanks for your reply!

I understand what the subtype mean at this time.

So, in this example 2, the results mean that there are no types of reorgnized TADs are identified under the condition 2, since all elements from type column under conditon 2 are nan. Is this understanding correct?

And may I ask how you compare and get this number "DiffDomain identifies that 30.771% of GM12878 TADs are reorganized in K562" based on the results in adjusted_TADs2.txt_types.txt?

Thank you.

Best regards, Stacey LIU

Tian-Dechao commented 6 months ago
  1. Yes. Please note that this does not mean that condition 2 has no TADs that are reorganized in condition 1. To identify condition 2 TADs that are reorganized in condition 1, just treat condition 2 as a new condition 1 and condition 1 as a new condition 2 and rerun DiffDomain. In the example2, treat K562 as condition 1 and GM12878 as condition 2.

2 To get the proportion of TADs that are reorganized in condition 2, we need to count two numbers. 1) the number of reorganized TADs significant column in adjusted_TADs2.txt_types.txt indicates whether a TAD is significantly reorganized, with 1 standing for significant, 0 for not significant, and nan for condition 2 TADs. Count the number of 1 should get the number of reorganized TADs. 2) the number of TADs Count 'condition1' in the origin column.