gem-pasteur / Integron_Finder

Bioinformatics tool to find integrons in bacterial genomes
GNU General Public License v3.0
67 stars 22 forks source link

Invalid characters #56

Closed pbravakos closed 5 years ago

pbravakos commented 6 years ago

Version of emg_analyzer:

Write here the output of integron_finder --version. integron_finder version 2-2018-10-05 Using:

OS

Expected behavior

Run analysis

Actual behavior

Runs only for specific scaffolds, not all of them

Steps to reproduce behavior

This is the command integron_finder --local-max --func-annot --cpu 20 -vv --pdf --gbk scaffolds.fasta

Relevant logs and/or screenshots

WARNING  : utils: L 119 : sequence Scaffold_2_length_1196781_pilon contains invalid characters, the sequence is skipped.
WARNING  : finder: L 584 : ############ Skipping replicon 2/6 ############
WARNING  : utils: L 119 : sequence Scaffold_1_length_1041693_pilon contains invalid characters, the sequence is skipped.
WARNING  : finder: L 584 : ############ Skipping replicon 3/6 ############
jeanrjc commented 6 years ago

Hello,

Thanks for your report.

In IntegronFinder v2, the input sequence must contain unambiguous DNA letters (ie. only ATGC). I guess your scaffolds have Ns in them. It's up to you to decide what you want to do with such sequences. IntegronFinder will not consider them. However it should work for your other scaffolds, right ?

Anyway, I just noticed that it wasn't in the documentation, we will update it accordingly.

Best, Jean

pbravakos commented 6 years ago

Yes it works with the scaffolds without the Ns but these scaffolds are very few. Thanks for clarifying this. Panos

jeanrjc commented 5 years ago

Hello @pbravakos !

We went back on this issue and now you can have ambiguous DNA sequence as input. It is on the dev branch if you want to give a try.

I'll close this issue.

Best, Jean

pbravakos commented 5 years ago

That's great. I ll try it for sure. Thanks Panos