XiaoTaoWang / TADLib

A Library to Explore Chromatin Interaction Patterns for Topologically Associating Domains
GNU General Public License v3.0
40 stars 11 forks source link

Problems with running TADLib #16

Closed shenlinyong closed 2 years ago

shenlinyong commented 2 years ago

This is my run script:

hitad -O fat.txt --exclude chrW,chrM -d fat_meta_file --logFile fat.log

This is an error message:

151 tadlib.hitad.genomeLev    DEBUG   @ 07/21/22 00:36:49:   Cache Chrom object into /storage/SLY68/2022/hic/juicer/down_analysis/tad/tadlib/.hitad/tmp6zmfixrg20220721003649 ...
    152 root                      INFO    @ 07/21/22 00:36:52: Done!
    153 root                      DEBUG   @ 07/21/22 00:36:52: Learning HMM parameters for each dataset ...
    154 tadlib.hitad.genomeLev    DEBUG   @ 07/21/22 00:36:52:   resolution: 1000, rep1
    155 Traceback (most recent call last):
    156   File "/home/SLY68/anaconda3/envs/tadlib/bin/hitad", line 121, in run
    157     G.learning(cpu_core=args.cpu_core)
    158   File "/home/SLY68/anaconda3/envs/tadlib/lib/python3.7/site-packages/tadlib/hitad/genomeLev.py", line 199, in learning
    159     seqs = self.train_data(res, rep)
    160   File "/home/SLY68/anaconda3/envs/tadlib/lib/python3.7/site-packages/tadlib/hitad/genomeLev.py", line 175, in train_data
    161     tmpcache.minWindows(0, tmpcache.chromLen, tmpcache._dw)
    162   File "/home/SLY68/anaconda3/envs/tadlib/lib/python3.7/site-packages/tadlib/hitad/chromLev.py", line 282, in minWindows
    163     diff = up - down
    164 ValueError: operands could not be broadcast together with shapes (0,) (1454,) 
XiaoTaoWang commented 2 years ago

Hi, thanks for sharing your cool files (https://github.com/XiaoTaoWang/EagleC/issues/8). This error occurred because previous TADLib version doesn't support chromosomes with a size less than 2Mb. I have fixed this in new version (v0.4.3). Can you upgrade your TADLib to this version (pip install -U TADLib) and try again?

By the way, it seems you mis-typed the resolution information in your previous run (your resolution should be 10000, rather than 1000), and to exclude specific chromosomes, you should type --exclude chrW chrM, without the comma. For your reference, here was my command:

$ hitad -O test.txt -d datasets -W RAW --exclude chrW chrM -p 6

Here I specified -W RAW because I noticed the original values in your cool files are already normalized (most of them are below 1).

And your metadata in datasets should look like this:

res:10000
  rep1:fat1_10000.cool
  rep2:fat2_10000.cool
  rep3:fat3_10000.cool
shenlinyong commented 2 years ago

Thank you so much for your help! This is really great. Last time I merged 3 duplicate sample bam files to make the 1kb cool file as input, since I don't know if TADlib supports 3 biological duplicate samples as input. Can support rep1, rep2, rep3 at the same time input is really perfect.

shenlinyong commented 2 years ago

I am so dumb that I encountered a new error reported:

  1 root                      INFO    @ 07/28/22 18:14:22: 
      2 # ARGUMENT LIST:
      3 # Output file name = fat.txt
      4 # Hi-C datasets = {10000: {'rep1': '/home/SLY68/2022/hic/juicer/down_analysis/tad/tadlib/data/fat1_10000.cool', 'rep2': '/home/SLY68/2022/hic/juicer/down_analysis/tad/tadlib/data/fat2_10000.cool', 'rep3':
      5 # Column for matrix balancing = RAW
      6 # Excluded chromosomes = ['chrW', 'chrM']
      7 # Maximum domain size = 4000000
      8 # Column for DI track = DIs
      9 # Number of processes used = 30
     10 # Remove cache data = False
     11 # Log file name = fat.log
......
     300 tadlib.hitad.genomeLev    DEBUG   @ 07/28/22 18:20:30:   resolution: 10000, rep1
    301 tadlib.hitad.genomeLev    DEBUG   @ 07/28/22 18:20:31:   Cache Chrom object into /storage/SLY68/2022/hic/juicer/down_analysis/tad/tadlib/.hitad/tmphvrq7khq20220728182031 ...
    302 tadlib.hitad.genomeLev    DEBUG   @ 07/28/22 18:20:34:   resolution: 10000, rep2
    303 tadlib.hitad.genomeLev    DEBUG   @ 07/28/22 18:20:34:   Cache Chrom object into /storage/SLY68/2022/hic/juicer/down_analysis/tad/tadlib/.hitad/tmp8fl3szm020220728182034 ...
    304 tadlib.hitad.genomeLev    DEBUG   @ 07/28/22 18:20:37:   resolution: 10000, rep3
    305 tadlib.hitad.genomeLev    DEBUG   @ 07/28/22 18:20:38:   Cache Chrom object into /storage/SLY68/2022/hic/juicer/down_analysis/tad/tadlib/.hitad/tmpn_0vc1wl20220728182038 ...
    306 root                      INFO    @ 07/28/22 18:20:41: Done!
    307 root                      DEBUG   @ 07/28/22 18:20:41: Learning HMM parameters for each dataset ...
    308 tadlib.hitad.genomeLev    DEBUG   @ 07/28/22 18:20:41:   resolution: 10000, rep1
    309 Traceback (most recent call last):
    310   File "/home/SLY68/anaconda3/envs/tadlib/lib/python3.7/site-packages/joblib/parallel.py", line 822, in dispatch_one_batch
    311     tasks = self._ready_batches.get(block=False)
    312   File "/home/SLY68/anaconda3/envs/tadlib/lib/python3.7/queue.py", line 167, in get
    313     raise Empty
    314 _queue.Empty
    315 
    316 During handling of the above exception, another exception occurred:
    317 
    318 Traceback (most recent call last):
    319   File "/home/SLY68/anaconda3/envs/tadlib/bin/hitad", line 121, in run
    320     G.learning(cpu_core=args.cpu_core)
    321   File "/home/SLY68/anaconda3/envs/tadlib/lib/python3.7/site-packages/tadlib/hitad/genomeLev.py", line 201, in learning
    322     stop_threshold=1e-5, n_jobs=cpu_core, verbose=False)
    323   File "pomegranate/hmm.pyx", line 2576, in pomegranate.hmm.HiddenMarkovModel.fit
    324   File "pomegranate/hmm.pyx", line 2619, in genexpr
    325   File "/home/SLY68/anaconda3/envs/tadlib/lib/python3.7/site-packages/joblib/parallel.py", line 1043, in __call__
    326     if self.dispatch_one_batch(iterator):
    327   File "/home/SLY68/anaconda3/envs/tadlib/lib/python3.7/site-packages/joblib/parallel.py", line 833, in dispatch_one_batch
    328     islice = list(itertools.islice(iterator, big_batch_size))
    329   File "pomegranate/hmm.pyx", line 2619, in genexpr
    330 TypeError: delayed() got an unexpected keyword argument 'check_pickle'
XiaoTaoWang commented 2 years ago

perhaps this is a package version compatibility issue? Can you make sure your packages have as close as the following versions:

pomegranate 0.10.0
networkx 1.x
joblib 0.13.2
shenlinyong commented 2 years ago

You are right, the joblib version is 0.13.3. it is working fine now.