Open GoogleCodeExporter opened 9 years ago
This nearly works.
It fails when two sequences are close to each other, but in separate bins.
This could be resolved by not binning, but by clustering.
Take the head sequence and align to each other. Put these on the number line
0-100. Starting at 100, take all sequences until one sequence is more than 10
percentage points from the one above it. If this group started somewhere above
90, then it's the same group as the head sequence. Otherwise process the rest
recursively. Find the next 'cluster' on the number line with no more than 10
percentage points of gap. Take the head of that group and align it to the
others in the group. Process recursively, returning a list.
Original comment by chad.a.davis@gmail.com
on 22 Jun 2010 at 1:11
Probably should just be doing standard linkage here (to avoid all-vs-all)
See Matt's code.
Original comment by chad.a.davis@gmail.com
on 27 May 2011 at 7:28
Original comment by chad.a.davis@gmail.com
on 27 May 2011 at 7:33
use Algorithm::Cluster::Thresh with SBG::Stamp::superposition()->scores->{Sc}
as as distance metric (via Algorithm::DistanceMatrix)
Original comment by chad.a.davis@gmail.com
on 19 Oct 2011 at 3:40
Original comment by chad.a.davis@gmail.com
on 19 Oct 2011 at 3:42
Original issue reported on code.google.com by
chad.a.davis@gmail.com
on 17 Jun 2010 at 3:55