brinkmanlab / islandpath

IslandPath standalone software
GNU General Public License v3.0
31 stars 4 forks source link

genomicislands cal_dinuc bug #8

Open JFsanchezherrero opened 5 years ago

JFsanchezherrero commented 5 years ago

Dear developers,

I have been checking the code as I wonder if there would be a possibility to generate the same analysis if provided a draft genome or assembly into multiple contigs, in genbank format too.

I am working on this new implementation and I would let you know when I have a working version if you might be interested in pulling the request.

Meanwhile, I found a bug in your code. It is in the genomicisland module, cal_dinuc function at the end of the function. Number line: 349-378.

https://github.com/brinkmanlab/islandpath/blob/34aad6cea0315a8efc4434e499726891ef47e582/lib/Dimob/genomicislands.pm#L349-L380

It affects so many lines because is in both sides of the if clause. Basically, the bug is related to the $count. If you add to $count before the first iteration you will never get the value for $ORF_ids[0], it would start in $ORF_ids[1].

To my point of view, this is not an intented behaviour and it should be addressed by adding to $count after the iteration.

Thank you very much

innovate-invent commented 5 years ago

Hi @JFsanchezherrero. Thanks for your interest in islandpath-dimob. IslandViewer currently enables draft support by stitching the draft into a single contig. Is that something you would be interested in?

@cbertell @fionabrinkman Who should we assign to this bug?

JFsanchezherrero commented 5 years ago

Dear @innovate-invent,

Thank you very much for addressing this bug.

About the other question, regarding the possibility of a draft assembly analysis you mentioned IslandViewer.

IslandViewer currently enables draft support by stitching the draft into a single contig. Is that something you would be interested in?

I previously checked the results via IslandViewer, via web and even via the api. I have to say results and aesthetics look amazing. But I am interested in implementing this genomic island identification in bacterial assemblies. I find kind of artificial to merge contigs according to a reference genome. Sometimes the reference genome could not be available or it could not be that close.

What I was thinking is just identifiying genomic islands in large and continous contigs, islands that are not broken. If any islands appears to be splitted because the contig finishes, it would be reported as partial or putative.

What do you think about that? I am working on your code trying to add a column according to the sequence id and computing genomic islands within each contig.

innovate-invent commented 5 years ago

Any contributions to our projects is appreciated. The concern would be if the contributions lowered the quality of the predictions. We were recently talking about adding a score to the predictions as well. Generally any island under 8kbp is discarded.

I want to make sure you are aware of the gff branch of islandpath

JFsanchezherrero commented 5 years ago

Hi there,

Any contributions to our projects is appreciated. The concern would be if the contributions lowered the quality of the predictions.

I am sure it is a concern, to your team and all developers, and it should be taken into account adequately. Several examples and benchmarking sets should be employed the same as multiple debugging messages.

So far, I have reproduced your example results with both versions, the original master version I cloned and the modified multi-contig version I have developed.

We were recently talking about adding a score to the predictions as well.

It would be great! It would provide the final user much more information.

I have to admit your output results for the command-line version of Dimob.pl are scarce. I have also added some annotation and additional information for each genomic island identified such as which are the genes involved and the dinucleotide bias. I find it could be useful to check this information for each genomic island identified.

Generally any island under 8kbp is discarded.

I could not see how you do it in you code. I can see you declare a MIN_GI_SIZE when you initialize the object. But I am not able to find any other reference to this minimum length. Also, it was stablished here as 2000 bp.

https://github.com/brinkmanlab/islandpath/blob/34aad6cea0315a8efc4434e499726891ef47e582/Dimob.pl#L76

# Create a dimob object
my $dimob_obj = Dimob->new( 
   {cfg_file => $cfname,
   bindir => $RealBin,
   workdir => $cwd,
   MIN_GI_SIZE => 2000,
   extended_ids => 1
});

I will try to pull a request tomorrow letting you know the changes I have made and several issues I have dealt with.

Regards

JFsanchezherrero commented 5 years ago

Dear @innovate-invent,

I want to make sure you are aware of the gff branch of islandpath

Ok, I was not aware of it. I will be dealing with it and let you know my implementation.