fiberseq / fibertools-rs

Tools for fiberseq data written in rust.
https://fiberseq.github.io/fibertools/fibertools.html
42 stars 5 forks source link

Bug in coordinates from ft center #14

Closed rlegendre closed 1 year ago

rlegendre commented 1 year ago

First, thanks to provides your tool to analyze fiberseq. Second, I use ft center to center my fibers around CTCF and I see some wrong positions in the output file: here my input read (from my bam file): ccs1.txt

here the wrong output:

1       4332678 +       N       4326954 4338731 m64155e_221011_134541/148899740/ccs     -4332678        -4320868        11810   -4332679,-4332679,-4332679,-4332679,-4332679,-5715,-571
3,-5709,-5702,-5700,-5698,-5696,-5695,-5694,-5691,-5689,-5688,-5685,-5684,-5676,-5669,-5638,-5624,-5594,-5555,-5547 ...

As you can see, the first positions of centered m6A positions correspond to start+1 of my region of interest, then they are correct.

here my command line: ft center fiberseq-smk/sample_.fiberseq.bam CTCF.bed -t 48 -w --reference > sample_center_ctcf.txt (fibertools ran on HPC serveur, on Red Hat Enterprise Linux 8.6)

Thanks for your help to correct this bug. Best Rachel

mrvollger commented 1 year ago

I meant to remove the wide output option and switch exclusively to the long format so I haven't been checking it while updating and fixing things other things.

I will look into it. But can you also check to see if the bug exists in the long (default) form of the output?

mrvollger commented 1 year ago

Without a valid bam file (not just the read but a file with a real header), and the CTCF.bed file I am unable to reproduce or debug your issue. Feel free to reopen if you can upload these files.

rlegendre commented 1 year ago

Thanks for your answer. I tested without the wide option (which, in my own opinion, is a very useful option), and the results are still incorrect.

I have prepared an archive with a part of my BAM file and some CTCF sites, on which I am be able to reproduce the error with both options. The data are available here: https://dl.pasteur.fr/fop/25g5TQgh/Test_data.tar

Thank you for your help.

mrvollger commented 1 year ago

Hi @rlegendre,

I can help but I need a smaller dataset, this is going to take > 2 hours to download, and will be hard for me to identify reads with issues.

Ideally, the subseted bam and bed would have only 1-2 CTCF sites and less than 5 reads per site.

Sorry for being picky about test cases, but to make solving these issues productive I need to build unit tests that require small files I can add to the repo and test with every change. I hope you understand.

Thanks, Mitchell

P.S. good to hear you want/like the wide format, I will try to keep it.

mrvollger commented 1 year ago

The download eventually finished but I am unable to reproduce your results. Here is an example of my results (top) vs yours (bottom):

image

I think your version of fibertools might be out of date.

rlegendre commented 1 year ago

Indeed I've install fibertools via conda, I will try by cloning the last repository, thanks

mrvollger commented 1 year ago

please reopen with the ft version if this doesn't fix it for you. cheers!