XiaoTaoWang / TADLib

A Library to Explore Chromatin Interaction Patterns for Topologically Associating Domains
GNU General Public License v3.0
40 stars 11 forks source link

TAD alignment problem between two samples #19

Open shenlinyong opened 2 years ago

shenlinyong commented 2 years ago

Thank you for developing such great software, I would like to use the class you wrote for tadlib.hitad.aligner.DomainSet class to find the difference TAD. However, I don't understand what the phrase "enstr: Unique identifier for input domain set" means, how should I prepare the enstr file for my data, and is the domainlistlist [['chr1', 150000, 360000, 0], ['chr1', 360000, 440000, 0], ['chr1', 440000, 860000, 0], ['chr1', 860000, 1200000, 0], ['chr1', 1200000, 1340000, 2], ['chr1', 1200000, 1590000, 1]]I prepared correct?

class tadlib.hitad.aligner.DomainSet(en, domainlist, res, hier=True)[[source]](https://xiaotaowang.github.io/TADLib/_modules/tadlib/hitad/aligner.html#DomainSet)
Parse and hold a hierarchical domain set.

Parameters
enstr
Unique identifier for input domain set.

domainlistlist
List of domains. See [tadlib.hitad.aligner.BoundSet](https://xiaotaowang.github.io/TADLib/hitad_api.html#tadlib.hitad.aligner.BoundSet) for details.

This is the python script I ran:

import sys
sys.path.append('/home/SLY68/anaconda3/envs/tadlib/lib/python3.7/site-packages/')
from tadlib.hitad.aligner import DomainAligner as DA
import pandas as pds
***************************************************************************
Version 0.4.3 is out of date, Version 0.4.4 is available.

***************************************************************************
fat_tad = pds.read_table("./2022/hic/juicer/down_analysis/tad/tadlib/fat.txt")
print(fat_tad[0:6])
   chr1        0   150000  0
0  chr1   150000   360000    0
1  chr1   360000   440000    0
2  chr1   440000   860000    0
3  chr1   860000  1200000    0
4  chr1  1200000  1340000    2
5  chr1  1200000  1590000    1
fat_tad=fat_tad.apply(lambda x: list(x), axis=1).values.tolist()
print(fat_tad[0:6])
[['chr1', 150000, 360000, 0], ['chr1', 360000, 440000, 0], ['chr1', 440000, 860000, 0], ['chr1', 860000, 1200000, 0], ['chr1', 1200000, 1340000, 2], ['chr1', 1200000, 1590000, 1]]
fat_data = DA("fat", fat_tad, 10000)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_208882/2791293192.py in <module>
----> 1 fat_data = DA("fat", fat_tad, 10000)

~/anaconda3/envs/tadlib/lib/python3.7/site-packages/tadlib/hitad/aligner.py in __init__(self, *args)
    564         self.DomainSets = {}
    565         for domains in args:
--> 566             self.DomainSets[domains.Label] = domains
    567         self.Results = {}
    568 

AttributeError: 'str' object has no attribute 'Label'

Thanks again for your help!

XiaoTaoWang commented 2 years ago

Sorry for the late response and thank you for your interests!

To align/compare between two domain sets, you first need to read the domain lists of two samples using the readHierDomain function:

>>> from tadlib.hitad.aligner import *
>>> list1 = readHierDomain('sample1.txt')
>>> list2 = readHierDomain('sample2.txt')

After that, pass the above lists to DomainSet, which will represent hierarchical domains in trees:

>>> sample1 = DomainSet('sample1', list1, 10000) # supposing your domains were called at the 10kb resolution
>>> sample2 = DomainSet('sample2', list2, 10000)

Finally, perform the alignment using DomainAligner:

>>> test_align = DomainAligner(sample1, sample2)
>>> test_align.align('sample1', 'sample2')

Different types of domain-level alignments can be then accessed through this object:

>>> conserved = test_align.conserved('sample1', 'sample2') # Conserved TADs
>>> semi = test_align.inner_changed('sample1', 'sample2') # Semi-Conserved TADs
>>> merged = test_align.merged('sample1', 'sample2') # Merged TADs
>>> split = test_align.split('sample1', 'sample2') # Split TADs

Let me know if you have any further questions.