3DGenomes / TADbit

TADbit is a complete Python library to deal with all steps to analyze, model and explore 3C-based data. With TADbit the user can map FASTQ files to obtain raw interaction binned matrices (Hi-C like matrices), normalize and correct interaction matrices, identify and compare the so-called Topologically Associating Domains (TADs), build 3D models from the interaction matrices, and finally, extract structural properties from the models. TADbit is complemented by TADkit for visualizing 3D models
GNU General Public License v3.0
101 stars 61 forks source link

valid read pairs is too few compared with other tools #340

Open shiyi-pan opened 3 years ago

shiyi-pan commented 3 years ago

Hi, I want to use TADbit to detect TADs, the TADbit detect 25,521,413 valid read pairs only ,but other tools such as HiCExplorer could detect 58949201 read pairs using the same data, I think maybe my code has some problem, could you give me some advise? I ues GEM as mapper and here is my code, thank you very much:

from pytadbit.mapping.full_mapper import full_mapping species = 'soybean'

for the first side of the reads

full_mapping(mapper_index_path='NN1138-2.v1.0.genome.gem', out_map_dir='results/fragment/{0}/01mapping/mapped{0}_r1/'.format(species), fastq_path='soybean_R1.fastq.gz', r_enz='MboI', frag_map=True, clean=True, nthreads=8, temp_dir='results/fragment/{0}/01_mapping/mapped_r1_tmp/'.format(species))

david-castillo commented 3 years ago

Hi,

The number of reads that you get with any pipeline depends on several factors but mainly on two: the mismatch you allow in the mapping of the reads and the filters you applied to the mapped reads. By default we use GEM, but even different versions of GEM have different default mismatches. Version 2 is about 4% or 6% if I remember correctly, other mappers allow higher mismatches like 15%. We also support bowtie2 and hisat2 and those will also produce a different number of reads depending on the parameters you use. By default we use the --very-sensitive option. Then you have the filters to discard what you consider non-informative or errors. Each pipeline has his own (although conceptually similar) filters.

Regards

David

shiyi-pan commented 3 years ago

Thank you for your reply, David. I will try other mappers like bowtie2 and hisat2 . I don't know clearly how to use bowtie2 into the TADbit pipeline . Is it just mapping Hi-C reads with bowtie2 and parse the bam file with parse_map function or not ?

david-castillo commented 3 years ago

You can use the same code that you already have but specify another mapper in the full_mapper function (like mapper='bowtie2'). Be sure you input the right mapper_index_path because for bowtie2 is not a single file, you have to specify the path and the prefix to the index.

http://3dgenomes.github.io/TADbit/reference/reference_mapping.html#mapping

You can also customize bowtie2 parameters with the mapper_params parameter if you need.

After that the parse is exactly the same.

Regards

David

shiyi-pan commented 3 years ago

Thanks a lot , David.