Perhaps Im missing something, but I seem to encounter the very problem you talked about in the paper: Ive been testing out IF on my single-isolate genome assemblies and am getting different results depending on whether I run IF on the full genome sequence or multi-gbk file with the still separate contig sequences. On a few samples Im getting more positive hits using the exact same parameters when running on the mutli-genbank (gbk) files, compared to the full genome sequence. On a few samples I'm getting up to 6 hits (3 integron/protein, 3 attC) whereas its singular, full genome sequence file gave no hits.
I'm assuming, as was mentioned in the paper, it's due to the multi-gbk presenting sequences as a draft genome that causes IF to overestimate CALIN and In0 hits? What would you advise here?
Perhaps Im missing something, but I seem to encounter the very problem you talked about in the paper: Ive been testing out IF on my single-isolate genome assemblies and am getting different results depending on whether I run IF on the full genome sequence or multi-gbk file with the still separate contig sequences. On a few samples Im getting more positive hits using the exact same parameters when running on the mutli-genbank (gbk) files, compared to the full genome sequence. On a few samples I'm getting up to 6 hits (3 integron/protein, 3 attC) whereas its singular, full genome sequence file gave no hits.
I'm running this on the Galaxy server with
#cmd: integron_finder /pasteur/zeus/projets/p00/galaxy-prod/galaxy-dist/database/files/008/712/dataset_8712770.dat --cpu 4 --keep-tmp --local-max --promoter-attI -dt 4000 --calin-threshold 1 --max-attc-size 200 --min-attc-size 40 --keep-palindromes --func-annot --gbk
I'm assuming, as was mentioned in the paper, it's due to the multi-gbk presenting sequences as a draft genome that causes IF to overestimate CALIN and In0 hits? What would you advise here?