3DGenomes / TADbit

TADbit is a complete Python library to deal with all steps to analyze, model and explore 3C-based data. With TADbit the user can map FASTQ files to obtain raw interaction binned matrices (Hi-C like matrices), normalize and correct interaction matrices, identify and compare the so-called Topologically Associating Domains (TADs), build 3D models from the interaction matrices, and finally, extract structural properties from the models. TADbit is complemented by TADkit for visualizing 3D models
GNU General Public License v3.0
100 stars 61 forks source link

Mapping problem tutorial #373

Closed manuelfmerino closed 2 years ago

manuelfmerino commented 2 years ago

Hello,

I am running the tutorial, but I got stuck running the mapping. When I try to do so, the following errors arise:

File "G_1_map_iterative_rep_1.py", line 18, in tempdir='results/iterativ/{0}{1}/01mapping/mapped{0}_{1}_r1_tmp/'.format(cell, rep))

File "/users/manfernandez/.conda/envs/py37/lib/python3.7/site-packages/pytadbit/mapping/full_mapper.py", line 628, in full_mapping light_storage=light_storage)

File "/users/manfernandez/.conda/envs/py37/lib/python3.7/site-packages/pytadbit/mapping/full_mapper.py", line 225, in transform_fastq header, seq, qal = get_seq(header)

File "/users/manfernandez/.conda/envs/py37/lib/python3.7/site-packages/pytadbit/mapping/full_mapper.py", line 49, in _get_fastq_read_heavy seq = next(fhandler)

StopIteration

I ran the indexing using GEM 2 and installed TADbit 1.0.1 using Conda on Python 3.7.

Thanks a lot, Manuel F. Merino

EDIT: I found out that there's a mismatch between the number of lines in my files originally downloaded using fastq-dump (400000000 lines, i.e. 100M reads) and the files after the dsrc compression (400840880 lines), when checked using the command dsrc d -s FASTQs/mouse_B_rep1_1.fastq.dsrc | wc -l. Moreover, when I checked the last lines of my compressed file, I noted that it doesn't end with the read #100000000. Instead, the last line corresponds to the read #97829587. I am now considering that the dsrc compression is messing with my files. Could that be a possibility? I installed and used the program as explained in the TADbit installation guide and tutorial.

david-castillo commented 2 years ago

Hi Manuel,

We've never had that kind of problems with dsrc. Try to use a gzip file for the fastqs or directly the fastq file uncompressed and check if the tutorial works.

Regards

David

manuelfmerino commented 2 years ago

Hi David,

I tried using gzip compression and everything works fine. I actually suspect that fastq-dump could have been messing with the files in the download process, but I didn't fully dive into the problem. I guess I'm okay for now not having to use it.

Cheers and thanks a lot, Manuel