VDBWRAIR / biotest

Testing framework to make python unittesting easier
GNU General Public License v2.0
1 stars 0 forks source link

Fasta + VCF hypothesis #3

Closed necrolyte2 closed 8 years ago

necrolyte2 commented 8 years ago

Generate fasta records as well as VCF records

I think we can refactor the factory function that generates fastq records such that it will generate either fasta or fastq(possibly any format?)

Related vdbwrair/bioframework#19

necrolyte2 commented 8 years ago

So for the fasta record, the only difference between any other record is that it has no quality information associated with the letters. That is, no .letter_annotations['phred_quality'] so I think you could just use the same function to get a SeqRecord and then just ignore the quality information.

Unless we are looking for text representations such as

>id
ATGC
averagehat commented 8 years ago

Yeah, I would factor out the make_seqrecord here and make it a parameter, so you can pass a version like

make_seqrec = lambda id, seq, quals:    \
       SeqRecord(Seq(seq, IUPAC.ambiguous_dna), id=str(id), description='')

that ignores the quality and returns a fasta-style record. That way you would avoid possibly hiding a dependency on quality scores.

averagehat commented 8 years ago

Usually when I use pyvcf I flatten the VCFRecord object into a simple dictionary like this, because I prefer that interface. So if instantiating a VCFRecord object or whatever it's called is too difficult we can always do that.

I will take a look at the freebayes VCF header and see what we can generate from that.

necrolyte2 commented 8 years ago

Closed in #5