bioperl / bioperl-live-redmine

Legacy tickets migrated from the OBF Redmine issue tracker: http://redmine.open-bio.org
0 stars 0 forks source link

Bio::Assembly::IO can not correctly parse all ace contig ids #146

Open cjfields opened 8 years ago

cjfields commented 8 years ago

Author Name: Danny Katzel (Danny Katzel) Original Redmine Issue: 3425, https://redmine.open-bio.org/issues/3425 Original Date: 2013-03-11 Original Assignee: Bioperl Guts


While trying out your ace parser using new Bio::Assembly::IO (-file=>"/path/to/ace", -format=>"ace");

I got an exception:

Can’t call method “get_consensus_sequence” on an undefined value at /usr/local/packages/perl-5.10.1/lib/5.10.1/Bio/Assembly/IO/ace.pm line 195, line 53."

Looking at the ace.pm file, I see that you define the contig header line as /^CO\s(\w+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms

the ace file that I was trying to read has a hyphen in the contig id. \w+ will not pick up the hyphen so the contig header is missed and no $contigObj is created.

A solution is to change the first group in the CO regex from \w+ to \S+ which will make the whole regex become /^CO\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms