Open cjfields opened 9 years ago
Original Redmine Comment Author Name: Jason Stajich Original Date: 2010-03-21T20:37:59Z
So you are disagreeing with Aaron’s response on the mailing list - I’m confused about what you want to do. If they aren’t from an alignment why are you reading them with AlignIO?
A basic assumption of the AlignIO objects is that they are parsing or writing alignment data.
If you want to read in sequences use Bio::SeqIO? What part of all of this do you find strange?
Original Redmine Comment Author Name: Mark A. Jensen Original Date: 2010-03-21T21:33:41Z
FWIW, I occasionally like to use AlignIO for unaligned sequences in order to use its random access (by_id, by_pos) methods. MAJ (In reply to comment #1)
So you are disagreeing with Aaron’s response on the mailing list - I’m confused about what you want to do. If they aren’t from an alignment why are you reading them with AlignIO?
A basic assumption of the AlignIO objects is that they are parsing or writing alignment data.
If you want to read in sequences use Bio::SeqIO? What part of all of this do you find strange?
Original Redmine Comment Author Name: Bernd empty Original Date: 2010-03-22T05:32:58Z
Hi Jason,
I find it strange that AlignIO::fasta (in constrast to clustal, stockholm etc) assumes input is aligned, and if it’s not making it “aligned”, though it is not. One practical problem occurs with user input (not my own;-): when a user should supply an alignment, but something is wrong with that alignment it’s not possible to chech is_flush as it’s always true. I agree with you and Aaron that if one wants to read in a set of FASTA seqs one should use SeqIO, and for alignments AlignIO. The (my) problem is that AlignIO::fasta changes unaligned FASTA input to something that looks like an alignment but is not. Thus, I disagree with Aaron and AlignIO:: fasta in this:
AlignIO::fasta makes the assumption that all of your sequences are aligned, This should not be assumed, either they are, or are not. If they are not this (in my case) is due to accidentally faulty input. and pads the ends of shorter sequences with gap characters (essentially, enforcing a rather silly, yet valid alignment). It’s a silly alignment, so why enforce such a thing?
The fact that is_flush() then returns 1 is secondary. I’d like to be able to check is_flush is OK, not that is was enforced. This is also the case with the (several) other AlignIO modules (Clustal, Stockholm, MSF) and can be used as an input sanity check.
Regards, Bernd
Original Redmine Comment Author Name: Chris Fields Original Date: 2010-03-22T09:09:26Z
(In reply to comment #3)
Hi Jason,
I find it strange that AlignIO::fasta (in constrast to clustal, stockholm etc) assumes input is aligned, and if it’s not making it “aligned”, though it is not. One practical problem occurs with user input (not my own;-): when a user should supply an alignment, but something is wrong with that alignment it’s not possible to chech is_flush as it’s always true. I agree with you and Aaron that if one wants to read in a set of FASTA seqs one should use SeqIO, and for alignments AlignIO. The (my) problem is that AlignIO::fasta changes unaligned FASTA input to something that looks like an alignment but is not. Thus, I disagree with Aaron and AlignIO:: fasta in this:
AlignIO::fasta makes the assumption that all of your sequences are aligned, This should not be assumed, either they are, or are not. If they are not this (in my case) is due to accidentally faulty input. and pads the ends of shorter sequences with gap characters (essentially, enforcing a rather silly, yet valid alignment). It’s a silly alignment, so why enforce such a thing?
The fact that is_flush() then returns 1 is secondary. I’d like to be able to check is_flush is OK, not that is was enforced. This is also the case with the (several) other AlignIO modules (Clustal, Stockholm, MSF) and can be used as an input sanity check.
Regards, Bernd
Okay, I see where you’re going (exception on user error). I tend to agree with both sides; the user should know something’s wrong, but it should still work when needed. Maybe automatically making the sequences flush should be an option for this format? The parser could throw/warn otherwise.
Author Name: Bernd empty (Bernd empty) Original Redmine Issue: 3030, https://redmine.open-bio.org/issues/3030 Original Date: 2010-03-19 Original Assignee: Bioperl Guts
Hi,
AlignIO::fasta assumes that the fasta input is (should be) an alignment. We have mailed about this before. However, i actually find it strange that the sequences are appended automatically with "
"; If that is needed it’s actually not an alignment and therefore $aln>is_flush should return false as it does with MSF, Stockholm, Clustal formats. I am not sure if changing this code will break things; possibly an ‘alignment check’ could be forced optionally.Below the code snippet of AlignIO::fata::next_aln I mean:
my $alnlen = $aln->length; foreach my $seq ( $aln->each_seq ) { if ( $seq->length < $alnlen ) { my ($diff) = ($alnlen - $seq->length); $seq->seq( $seq->seq() . “-” x $diff); } }
The issue is that esp with user input a FASTA alignment could not be flushed and should not be changed into a corrected alignment automatically. I would be strange first having to read all sequences with SeqIO::fasta and check their length and then reading all into an Align object with SimpleAlign.
I would regard it unwanted behaviour of AlignIO::fasta to turn sequences into an alignment.
From the mailing list:
On Wed, Dec 5, 2007 at 3:56 PM, aaron.j.mackey@gsk.com wrote: