3DGenomes / TADbit

TADbit is a complete Python library to deal with all steps to analyze, model and explore 3C-based data. With TADbit the user can map FASTQ files to obtain raw interaction binned matrices (Hi-C like matrices), normalize and correct interaction matrices, identify and compare the so-called Topologically Associating Domains (TADs), build 3D models from the interaction matrices, and finally, extract structural properties from the models. TADbit is complemented by TADkit for visualizing 3D models
GNU General Public License v3.0
100 stars 61 forks source link

model problem #343

Open Guoshuai1314 opened 3 years ago

Guoshuai1314 commented 3 years ago

When I ran the test data, I found that module Modelling took too long. It was expected to take 6 minutes, but I ran it in 2334 minutes.When I tested it with my own data, it ran for more than 20 days and didn't finish.Is it my system?I use the Centos7 system.

image

david-castillo commented 3 years ago

Hi,

That's not normal. Can you run this test inside the 'test' folder of TADbit?

python test_all.py 13

That will run a simple modelling that should take seconds to compute.

Regards

David

Guoshuai1314 commented 3 years ago

Hi,

That's not normal. Can you run this test inside the 'test' folder of TADbit?

python test_all.py 13

That will run a simple modelling that should take seconds to compute.

Regards

David

yes, It works fine. image

david-castillo commented 3 years ago

How did you generate your matrix with "tadbit bin"? It's better to generate the matrix only of the region to be modelled, otherwise TADbit might be trying to recreate the full Hi-C matrix just to take a small region afterwards.

Guoshuai1314 commented 3 years ago

How did you generate your matrix with "tadbit bin"? It's better to generate the matrix only of the region to be modelled, otherwise TADbit might be trying to recreate the full Hi-C matrix just to take a small region afterwards.

I did not use "tadbit bin" to generate the matrix, but directly used JuicerTools dump to generate the matrix.And then I did it with the whole chromosome matrix.

david-castillo commented 3 years ago

Then I'm sure TADbit is trying to rebuild the full matrix and that's taking forever if your computer has not a lot of memory and speed. Can you convert your data to the following format?

# CRM chr20 64444167
# chr20:2-102 resolution:50000
# MASKED 
0   0       8
9   0       1
...

So, First line is the chromosome and its total size Second line is the region contained in the file. In this case chromosome 20 at 50Kbp from bin 2 to bin 102 (chr20:100000-5100000) Third line is used to mask columns in case those columns have no data, you cam leave it blank. Then each line is a i,j and value (need to be normalized) starting with 0,0

David

Guoshuai1314 commented 3 years ago

Then I'm sure TADbit is trying to rebuild the full matrix and that's taking forever if your computer has not a lot of memory and speed. Can you convert your data to the following format?

# CRM chr20   64444167
# chr20:2-102 resolution:50000
# MASKED 
0 0       8
9 0       1
...

So, First line is the chromosome and its total size Second line is the region contained in the file. In this case chromosome 20 at 50Kbp from bin 2 to bin 102 (chr20:100000-5100000) Third line is used to mask columns in case those columns have no data, you cam leave it blank. Then each line is a i,j and value (need to be normalized) starting with 0,0

David

OK, I'll try it right away.Thank you for your help.

Guoshuai1314 commented 3 years ago

Then I'm sure TADbit is trying to rebuild the full matrix and that's taking forever if your computer has not a lot of memory and speed. Can you convert your data to the following format?

# CRM chr20   64444167
# chr20:2-102 resolution:50000
# MASKED 
0 0       8
9 0       1
...

So, First line is the chromosome and its total size Second line is the region contained in the file. In this case chromosome 20 at 50Kbp from bin 2 to bin 102 (chr20:100000-5100000) Third line is used to mask columns in case those columns have no data, you cam leave it blank. Then each line is a i,j and value (need to be normalized) starting with 0,0

David

I have converted my data to the following format and reduced the matrix scope to 2M (target region).

CRM Chr2 2000000

Chr2:0-200 resolution:10000

MASKED

0 0 1135.049 0 1 109.90044 1 1 837.5016 0 2 67.08255 1 2 149.96063 2 2 858.84674 0 3 14.311334 1 3 51.089424 2 3 161.17682 .......

Then, I run it with the following command, but it has been running for 3 days and is not over yet.

tadbit model -w test/both --input_matrix Chr2_2M.abc --noX --optimize --beg 1 --end 2000000 --reso 10000 --maxdist 400:500:100 --upfreq=-0.2:0:0.1 --lowfreq=-0.4:-0.2:0.1 --nmodels 20 --nkeep 20 -j 60 --cpu 60

I noticed in the background that Tadbit was sleeping and the system had free memory and CPU. Why is that?

david-castillo commented 3 years ago

Hi, I don't see the problem. Can you try to take out this part here:

-j 60 --cpu 60

You don't need any job id. I'll check in my computer to see if it's some kind of bug with the upgrade of dependencies. You used conda to install it?

Regards

David

Guoshuai1314 commented 3 years ago

Hi, I don't see the problem. Can you try to take out this part here:

-j 60 --cpu 60

You don't need any job id. I'll check in my computer to see if it's some kind of bug with the upgrade of dependencies. You used conda to install it?

Regards

David

No, I installed it from source code, like this

cd tadbit-master sudo python setup.py install