We should think about what our goals are, and make sure that we are providing something that isnt already available through projects such as biopython or scikit-bio.

Here are my specific needs:

File I/O

- The ability to iterate through a sequence and alignment files with python generators. (Currently available in biopython and scikit-bio)
- The ability to get a subset of sequences or alignments from a file given query headers. (Not sure if these are available elsewhere)
- The ability to query a sequence file with exact sequence matches or subsequece matches
- The ability to read a whole sequence or alignment file into memory. (Available in biopython and scikit-bio)
- In all the above cases, I personally like the option to handle sequences or alignments as built-in strings/lists, or objects with added utilities. (Still figuring out if this is possible with [scikit-bio and their "into" keyword argument](http://scikit-bio.org/docs/0.2.3/io.html) )

Genome Assembly Utilities

I have not been able to find these utilities in biopython or scikit-bio

- Genome assembly stats for a collection of sequences
- Various coverage calculations
- Objects for specific sequencing chemistries:
    - e.g. HiC and Mate Pair sequences and alignments
- Reference genome objects 
    -  Utilities for:
        - gaps
        - ambiguous sequences
        - assembly stats
        - lift over tools
        - subsequence searches

Those are just what come to mind for me. Please let me know if there are python packages out there that address these things well already.

Python-bioinformatics / bioinformatics

Goals #1

File I/O

Genome Assembly Utilities