eftsung / pygr

Automatically exported from code.google.com/p/pygr
0 stars 0 forks source link

add NLMSA convenience method for loading aligned intervals from any source #40

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
pass it an iterator that generates (ival1, ival2) pairs, and they will be
automatically loaded into your NLMSA.  This is just a convenience (it is
not hard to save aligned intervals into NLMSA by the existing graph
interface), but may make it easier for people to add other formats... 
Maybe we could add a CLUSTAL reader as an example...

Original issue reported on code.google.com by cjlee...@gmail.com on 11 Sep 2008 at 5:02

GoogleCodeExporter commented 8 years ago
NLMSA.add_aligned_intervals(alignedIvals, alignedIvalsSrc, alignedIvalsDest,
alignedIvalsAttrs)

alignedIvals must be iterable, should return one or more objects or tuples that
represent aligned intervals as ID/coordinate sets.

Original comment by cjlee...@gmail.com on 10 Dec 2008 at 11:44

GoogleCodeExporter commented 8 years ago

Original comment by mare...@gmail.com on 21 Feb 2009 at 2:05

GoogleCodeExporter commented 8 years ago
Hi Titus,
could you verify this enhancement?  It adds something that provides a simple 
way for
developers to write their own parsers for reading their desired alignment 
formats
into NLMSA.  I use it for the BlastMapping readers, for example...

-- Chris

Original comment by cjlee...@gmail.com on 4 Mar 2009 at 11:53

GoogleCodeExporter commented 8 years ago

Original comment by mare...@gmail.com on 13 Mar 2009 at 1:00

GoogleCodeExporter commented 8 years ago
Initial review: created a new test class, ran code.

Issues:

fixed 'al.seqDict' bug                                                         

Also, the ori attribute was not optional for tuples without some mangling.  See 
tests
and classutil fix.

In the docs, the names of the passed-in seqdbs are wrong: alignedIvalsSrc => 
srcDB, 

alignedIvalsDest => destDB.                                      

I think for read_aligned_coords, alignedIvalsAttrs should default to {}.  See 
the
weird pathological behavior in nlmsa_test, method test_single_ival_attr(), for a
reason why.

I don't understand the need for or use of CoordsGroupStart/End.  ??

See branch http://github.com/ctb/pygr/commits/review_issue40.

Original comment by the.good...@gmail.com on 22 Mar 2009 at 6:12

GoogleCodeExporter commented 8 years ago
(The status should be reset to FixedNeedsReview after bug fixes are committed; 
the
review isn't complete.)

Original comment by the.good...@gmail.com on 26 Apr 2009 at 8:50

GoogleCodeExporter commented 8 years ago
In answer to Titus' question about CoordsGroupStart / End:
BLAST returns alignments one at a time.  A given alignment might contain any 
number
of aligned interval pairs.  For some alignment results (tblastn, blastx, 
tblastx),
the grouping of the interval pairs into a single alignment is crucial 
information. 
CoordsGroupStart indicates the beginning of a BLAST alignment; CoordsGroupEnd
indicates the end of an alignment.  Search on CoordsGroupStart in blast.py to 
see
examples of how this is used, if you're interested.

Original comment by cjlee...@gmail.com on 28 Apr 2009 at 1:59

GoogleCodeExporter commented 8 years ago
Hi Titus,
I don't understand why you moved the "if srcDB is None..." default-setting code 
from
read_aligned_coords() to add_aligned_coords() in your branch review_issue40.  
First
of all, srcDB/destDB are *never* used in add_aligned_coords() other than for 
passing
them into read_aligned_coords(), so I don't see how this can fix any problem. 
Second, read_aligned_coords() needs to guarantee that srcDB / destDB are not 
None;
due to your change that is no longer guaranteed if it is called from anywhere 
other
than add_aligned_coords().  

The test code that seemed possibly relevant to this (test_no_seqDict_args) 
appears
not to be used (it is not even part of a test case class).  What bug was this 
change
intended to fix?

-- Chris

Original comment by cjlee...@gmail.com on 28 Apr 2009 at 2:15

GoogleCodeExporter commented 8 years ago
Huh, odd thing with the test code -- I must have detabbed it somehow!?

The srcDB=al.seqDict line breaks in readAlignedCoords because al is not 
available in
rAC; it's only passed into add_aligned_intervals.  Try running the (re-tabbed ;)
nlmsa_test code without the associated nlmsa and classutil fixes; I've put 
these in
branch review_issue40_x in the github.com/ctb/pygr repo.

Original comment by the.good...@gmail.com on 28 Apr 2009 at 3:00

GoogleCodeExporter commented 8 years ago
The code has been committed to master, closing the issue.

Original comment by mare...@gmail.com on 13 May 2009 at 2:03

GoogleCodeExporter commented 8 years ago
See comment 6.

Original comment by the.good...@gmail.com on 13 May 2009 at 2:05

GoogleCodeExporter commented 8 years ago
This code was substantially refactored by Chris, and I'm happy with the current
results.  We will probably revisit it for 0.9...

Original comment by the.good...@gmail.com on 2 Sep 2009 at 2:00