gem-pasteur / Integron_Finder

Bioinformatics tool to find integrons in bacterial genomes
GNU General Public License v3.0
64 stars 22 forks source link

IF give different results depending installation #91

Closed bneron closed 3 years ago

bneron commented 3 years ago

Version of Integron_Finder:

2.0rc7

OS

Expected behavior

IF should give the same results if same command line with same data whatever the hots

Actual behavior

IF does not give the same results with a local installation (gentoo) vs installation within docker based on ubuntu:hirsute

Steps to reproduce behavior

Relevant logs and/or screenshots

with local installation

INFO     : ############ Processing replicon NZ_CP016323.1 (1/1) ############

INFO     : Starting Default search ... :
INFO     : Default search done... : 
INFO     : In replicon NZ_CP016323.1, there are:
INFO     : - 0 complete integron(s) found with a total 0 attC site(s)
INFO     : - 2 CALIN element(s) found with a total of 44 attC site(s)
INFO     : - 0 In0 element(s) found with a total of 0 attC site

with docker version

INFO     : ############ Processing replicon NZ_CP016323.1 (1/1) ############

INFO     :  Starting Default search ... :
INFO     :  Default search done... : 
INFO     :  In replicon NZ_CP016323.1, there are:
INFO     :  - 0 complete integron(s) found with a total 0 attC site(s)
INFO     :  - 1 CALIN element(s) found with a total of 42 attC site(s)
INFO     :  - 0 In0 element(s) found with a total of 0 attC site

it lacks to attc sites in docker version and one CALIN NZ_CP016323.fna.txt

bneron commented 3 years ago

The problem come from infernal.py/read_infernal https://github.com/gem-pasteur/Integron_Finder/blob/cc6016fd0c9a2685ad8825aadb861708c9a714d4/integron_finder/infernal.py#L74

depending on the installation the cmsearch output may differ cmsearch find exactly the same results but the footer can be slightly different below the end of same cmsearch command line

cmsearch --cpu 1 -A NZ_CP016323.1_attc.res --tblout NZ_CP016323.1_attc_table.res -E 10 --incE 10 ../data/Models/attc_4.cm NZ_CP016323.1.fst

with cmsearch compiled from src on gentoo

NZ_CP016323.1        -         attC_4               -          cm        4       44    28862    28982      +    no    1 0.50   0.0   22.0    0.0029 !   Vibrio vulnificus strain FORC_037 plasmid unnamed, complete sequence
NZ_CP016323.1        -         attC_4               -          cm        1       47    22286    22410      +    no    1 0.46   0.0   19.4     0.014 !   Vibrio vulnificus strain FORC_037 plasmid unnamed, complete sequence
NZ_CP016323.1        -         attC_4               -          cm        1       47    14852    14975      +    no    1 0.48   0.0   18.5     0.023 !   Vibrio vulnificus strain FORC_037 plasmid unnamed, complete sequence
NZ_CP016323.1        -         attC_4               -          cm        1       47    19308    19266      -    no    1 0.35   0.0   17.3     0.046 !   Vibrio vulnificus strain FORC_037 plasmid unnamed, complete sequence
#
# Program:         cmsearch
# Version:         1.1.4 (Dec 2020)
# Pipeline mode:   SEARCH
# Query file:      /home/bneron/Projects/GEM/Integron_Finder/src/Integron_Finder/data/Models/attc_4.cm
# Target file:     NZ_CP016323.1.fst
# Option settings: /home/bneron/Projects/GEM/Integron_Finder/src/infernal/infernal-1.1.4/src/cmsearch -A NZ_CP016323.1_attc.res --tblout NZ_CP016323.1_attc_table.res -E 10 --incE 10 --cpu 1 /home/bneron/Projects/GEM/Integr
on_Finder/src/Integron_Finder/data/Models/attc_4.cm NZ_CP016323.1.fst 
# Current dir:     /home/bneron/Projects/GEM/Integron_Finder/src/Integron_Finder/test_local
# Date:            Mon Jul 26 20:56:01 2021
# [ok]

with the cmsearch package on ubuntu hirsute

NZ_CP016323.1        -         attC_4               -          cm        4       44    28862    28982      +    no    1 0.50   0.0   22.0    0.0029 !   Vibrio vulnificus strain FORC_037 plasmid unnamed, complete sequence
NZ_CP016323.1        -         attC_4               -          cm        1       47    22286    22410      +    no    1 0.46   0.0   19.4     0.014 !   Vibrio vulnificus strain FORC_037 plasmid unnamed, complete sequence
NZ_CP016323.1        -         attC_4               -          cm        1       47    14852    14975      +    no    1 0.48   0.0   18.5     0.023 !   Vibrio vulnificus strain FORC_037 plasmid unnamed, complete sequence
NZ_CP016323.1        -         attC_4               -          cm        1       47    19308    19266      -    no    1 0.35   0.0   17.3     0.046 !   Vibrio vulnificus strain FORC_037 plasmid unnamed, complete sequence
#
# Program:         cmsearch
# Version:         1.1.4 (Dec 2020)
# Pipeline mode:   SEARCH
# Query file:      /usr/local/share/integron_finder/data/Models/attc_4.cm
# Target file:     NZ_CP016323.1.fst
# Option settings: /usr/lib/infernal/cmsearch -A NZ_CP016323.1_attc.res --tblout NZ_CP016323.1_attc_table.res -E 10 --incE 10 --cpu 1 /usr/local/share/integron_finder/data/Models/attc_4.cm NZ_CP016323.1.fst 
# [ok]

on ubuntu hirsute it lack 2 lines in footer

# Current dir:     /home/bneron/Projects/GEM/Integron_Finder/src/Integron_Finder/test_local
# Date:            Mon Jul 26 20:56:01 2021

so the parsing of this file discarded the 2 last attc sites

bneron commented 3 years ago

commit 2635b8b solve this issue