bioperl / bioperl-live-redmine

Legacy tickets migrated from the OBF Redmine issue tracker: http://redmine.open-bio.org
0 stars 0 forks source link

Bio::Search::HSP::FastaHSP -> get_aln -> Bio::Locatable end is float #62

Open cjfields opened 8 years ago

cjfields commented 8 years ago

Author Name: Alexie Papanicolaou (Alexie Papanicolaou) Original Redmine Issue: 2913, https://redmine.open-bio.org/issues/2913 Original Date: 2009-09-11 Original Assignee: Bioperl Guts


This bug is present in the BioPerl version present in the CVS on 2009-04-29.

I get the following warning when parsing a fasty34 HSP using Bio::Search and then trying to getting the alignment using get_aln

MSG: In sequence CONTIG residue count gives end value 565.333333333333. Overriding value [565] with value 565.333333333333 for Bio::LocatableSeq::end(). MAEMFKIGDLVWAKMKGFSPWPGLVSNPTKDLKRPTSKKSAQQ/CVFFLGTNNYAWIEEANIKPYFEYRDRLVKSNKSGAFKDALDAIEEYIKNNGAKFDDPDAEFNRLRESLAEKKESKPKQRKEKRPAHDDNSAKSPKKVRTNSVEADKESVRADSPILSNHSPRKGPASTLLERPTTIVRPLDDSQD STACK Bio::LocatableSeq::end /usr/local/share/perl/5.8.8/Bio/LocatableSeq.pm:196 STACK Bio::LocatableSeq::new /usr/local/share/perl/5.8.8/Bio/LocatableSeq.pm:140 STACK Bio::Search::HSP::FastaHSP::get_aln /usr/local/share/perl/5.8.8/Bio/Search/HSP/FastaHSP.pm:174

The frameshifts (/ and \ ) are causing this recalculation of length to a float (which is a bit weird) but is not fatal for my program. Is this intentional?

Example files emailed to Mark Jensen.

cheers alexie

cjfields commented 8 years ago

Original Redmine Comment Author Name: Alexie Papanicolaou Original Date: 2009-09-11T09:30:43Z


Created an attachment (id=1364) test data to replicate issue

cjfields commented 8 years ago

Original Redmine Comment Author Name: Chris Fields Original Date: 2009-09-11T10:37:34Z


Mark, Alexie, I’ve confirmed this on main trunk. I wrote up the code that calculates the frameshift, so I’ll take a look.

cjfields commented 8 years ago

Original Redmine Comment Author Name: Mark A. Jensen Original Date: 2009-09-11T10:43:55Z


Chris— I think I fixed the frameshift code in my LocatableSeq mods that are still in limbo. I remember this “float” issue distinctly. How would you like me to proceed—I could try to use that code to patch the current version, or you may want to have a look— cheers MAJ

cjfields commented 8 years ago

Original Redmine Comment Author Name: Chris Fields Original Date: 2009-09-11T11:37:19Z


Try running SearchIO/fasta.t tests with the patch and let’s see how it acts. We should have a few tests for frameshifts there (this would definitely be a good additional one), so the patch should pass those in addition to call the end here correctly.

cjfields commented 8 years ago

Original Redmine Comment Author Name: Mark A. Jensen Original Date: 2009-09-11T11:40:19Z


aye-aye

cjfields commented 8 years ago

Original Redmine Comment Author Name: Mark A. Jensen Original Date: 2009-09-11T18:21:19Z


Created an attachment (id=1365) patch to fix length calculations in the presence of frameshifts

Here is a patch to LocatableSeq.pm only that passes all t/SearchIO/fasta.t tests for me, and eliminates Alexie’s bork. This makes use of a method _encode_frameshifts that was part of my LocatableSeq rewrite of last year (still in limbo).

Though it passes the tests, there is an issue: my implementation of storing frameshifts and jason’s appear to be the mirror images of each other. Where he encodes “/” as –1, I encode “/” as +1, etc. I was revamping subseq() so it could handle frameshifts and mappings correctly, and I found it most natural (in calculating lengths) to say that, e.g.,

QQ/QQ

was 4 aa’s long, but 3+3+1+3+3 = 13 nts long, which must reflect the underlying untranslated sequence. I fixed a slight logical error in _ungapped_length, which calculates the offset based on the frameshift hash, and in examining it, it looked like cjf also thought of the offsets in the same way. So, please give it a try and some thought too. Alexie, you can apply the patch to your local copy of LocatableSeq.pm and see if it does what you expect (especially regarding the frameshift hash, and the lengths that are output).

cheers MAJ

cjfields commented 8 years ago

Original Redmine Comment Author Name: Alexie Papanicolaou Original Date: 2009-09-18T07:56:00Z


Created an attachment (id=1366) a test case with questions

Hello

Seems the processing of the output is more accurate but i’m not understanding somethings. I assume you use \ for an insertion and / for a deletion (as FASTY does): Attached is a test-case. a) shouldn’t the end be 375 (360 as the last landmark on FASTY + 5 amino acids * 3)? c) shouldn’t the frameshift be a bit higher, like bp. 34?