gem-pasteur / Integron_Finder

Bioinformatics tool to find integrons in bacterial genomes
GNU General Public License v3.0
67 stars 22 forks source link

[HELP] Different results in hits between multi-gbk and merged gbk on same sample? #109

Closed Svnipni closed 1 year ago

Svnipni commented 1 year ago

Perhaps Im missing something, but I seem to encounter the very problem you talked about in the paper: Ive been testing out IF on my single-isolate genome assemblies and am getting different results depending on whether I run IF on the full genome sequence or multi-gbk file with the still separate contig sequences. On a few samples Im getting more positive hits using the exact same parameters when running on the mutli-genbank (gbk) files, compared to the full genome sequence. On a few samples I'm getting up to 6 hits (3 integron/protein, 3 attC) whereas its singular, full genome sequence file gave no hits.

I'm running this on the Galaxy server with

#cmd: integron_finder /pasteur/zeus/projets/p00/galaxy-prod/galaxy-dist/database/files/008/712/dataset_8712770.dat --cpu 4 --keep-tmp --local-max --promoter-attI -dt 4000 --calin-threshold 1 --max-attc-size 200 --min-attc-size 40 --keep-palindromes --func-annot --gbk

I'm assuming, as was mentioned in the paper, it's due to the multi-gbk presenting sequences as a draft genome that causes IF to overestimate CALIN and In0 hits? What would you advise here?

jeanrjc commented 1 year ago

Hello,

Could you share your input files so I can try to better reproduce the problem ?

jeanrjc commented 1 year ago

closing for inactivity