gem-pasteur / Integron_Finder

Bioinformatics tool to find integrons in bacterial genomes
GNU General Public License v3.0
67 stars 22 forks source link

[BUG] Wrong coordinates in results #114

Closed fmalmeida closed 4 months ago

fmalmeida commented 8 months ago

Describe the bug Hi, First thanks for the nice work on this tool. I have been using this tool in a pipeline of mine and it has been working awesomely.

Recently I tried using it with some vibrio genomes, and it has been showing problems with the annotation of integrons that happen in the very start of the sequences.

If many fails because it has sometimes generated results at the very first base and writing it as 0-index for example. And in some other, it has generated wrong negative start positions as below:

13      Integron_Finder integron        69515   74987   .       +       1       ID=integron_01;integron_type=complete
24      Integron_Finder integron        25      12675   .       +       1       ID=integron_01;integron_type=CALIN
25      Integron_Finder integron        19      9958    .       +       1       ID=integron_01;integron_type=CALIN
27      Integron_Finder integron        6936    9536    .       +       1       ID=integron_01;integron_type=complete
31      Integron_Finder integron        478     4564    .       +       1       ID=integron_01;integron_type=CALIN
32      Integron_Finder integron        66      4604    .       +       1       ID=integron_01;integron_type=CALIN
33      Integron_Finder integron        117     4047    .       +       1       ID=integron_01;integron_type=CALIN
37      Integron_Finder integron        -2      3108    .       +       1       ID=integron_01;integron_type=CALIN
38      Integron_Finder integron        2       2804    .       +       1       ID=integron_01;integron_type=CALIN
44      Integron_Finder integron        70      1709    .       +       1       ID=integron_01;integron_type=CALIN
46      Integron_Finder integron        -17     1603    .       +       1       ID=integron_01;integron_type=CALIN

I am thus, sharing the gbk files that were generated by integron_finder itself during analysis so that you can see the generated results, while at the same time having the contig sequence for reproducing it.

gbk_37_and_46.zip

To Reproduce

integron_finder --local-max --func-annot --pdf --gbk --cpu 4 vibrio31.fna

Expected behavior

The minimum allowed starting base should be 1, not 0 nor negative.

OS:

Integron_Finder Version:

version 2.0.1

jeanrjc commented 8 months ago

Hello,

could you share vibrio31.fna ?

Thanks

fmalmeida commented 8 months ago

Hello hello, The two problematic contigs shared in the two genbank files (output of integron finder) in the zip file are not sufficient? I am not sure I can share the whole genome ( I can ask if not sufficient ). Cheers.

fmalmeida commented 8 months ago

Here is the fna file of the genome, containing the two contigs ( 37 and 46 ). vibrio31_subset.fna.gz

jeanrjc commented 8 months ago

Ah ok, I found the bug, it's because there is a hit on the very first position but the attC model is truncated. And when a model is truncated, we corrected the position, such that the real start of the attC site starts a bit before.

The bug is around L95 in infernal.py I think.

I don't have much time to fix that now, feel free to propose a PR if you can. Otherwise, me or @bneron might try to fix that when we can.

Best

bneron commented 8 months ago

I'm going to work on it

bneron commented 8 months ago

If I understand the problem, the position should be 0 in this case, isn't it?

fmalmeida commented 8 months ago

Actually, I believe should be 1.

I believe genbank and gff files are 1-index based.

jeanrjc commented 8 months ago

yes, and we should also check for the same case where the attC model is truncated at the end of a contig (not only at the start as in this issue).

bneron commented 8 months ago

@jeanrjc could you check the fix I just made https://github.com/gem-pasteur/Integron_Finder/blob/73a5801badc5ef29bcea0ed9c7cab4e7166013a7/integron_finder/infernal.py#L96

jeanrjc commented 8 months ago

@jeanrjc could you check the fix I just made

https://github.com/gem-pasteur/Integron_Finder/blob/73a5801badc5ef29bcea0ed9c7cab4e7166013a7/integron_finder/infernal.py#L96

It works for me ! Thanks

jeanrjc commented 6 months ago

is this merged @bneron ?

Ales-ibt commented 5 months ago

Hello, I am getting the same error with IntegronFinder v2.0.2. I can see the bug was fixed but It is not yet in the current release. Could you guys please add this fix to main?

bneron commented 4 months ago

fixed in integron_finder 2.0.5 version