EddyRivasLab / easel

Sequence analysis library used by Eddy/Rivas lab code
Other
46 stars 26 forks source link

Pfam uses #=GC seq_cons, Easel expects #=GC RF #18

Open cryptogenomicon opened 7 years ago

cryptogenomicon commented 7 years ago

Consensus column annotation in Pfam is on a #=GC seq_cons line. Easel expects reference annotation on a #=GC RF line. The Pfam seq_cons lines are handled by Easel as "unparsed annotation", which is usually fine. An example where it's not fine is in esl_msa_SequenceSubset(), which only propagates recognized GC annotations, not unparsed ones, for fear of propagating annotation that would not be valid for a subsetted alignment. Elena and Grey ran across this.