Zsailer / phylopandas

Pandas DataFrames for phylogenetics
BSD 3-Clause "New" or "Revised" License
72 stars 21 forks source link

Added a clustal test file, an example notebook, and alignment parsing… #1

Open biophyser opened 7 years ago

biophyser commented 7 years ago

… in the _read function. Also made two hidden funcitons to iterate sequence and alignment biopython objects.

Did a bit of hacking to get the dataframes to have the same form as previously. Multiple sequence alignments are spit out as a multiindexed dataframe.

Zsailer commented 7 years ago

Awesome! This looks good. I'm testing it now and will leave comments if I see things that need fixing.

Zsailer commented 7 years ago

One general comment about coding style:

I typically use a "lowercase and underscore" convention when naming instance variables, functions, and methods (except in the unusual scenario where a function is a factory for a class).

This convention was set by PEP 8. I'd like to follow PEP 8 as best as we can.

I'll leave comments in the PR where this convention is broken.

Zsailer commented 7 years ago

It might be nice to be able to name the alignments instead of having each alignment be multiindexed by a number but maybe that's not a huge issue.

I think I agree with this statement... give each item a value in the name column (like alignment_1 and alignment_2) rather than multi-index.

Could you explain the output in the example clustal? Are each mouse/opossum line-pairs just segments of one longer sequence? Or are each pair a different sequence?