Closed necrolyte2 closed 8 years ago
So for the fasta record, the only difference between any other record is that it has no quality information associated with the letters. That is, no .letter_annotations['phred_quality']
so I think you could just use the same function to get a SeqRecord and then just ignore the quality information.
Unless we are looking for text representations such as
>id
ATGC
Yeah, I would factor out the make_seqrecord
here and make it a parameter, so you can pass a version like
make_seqrec = lambda id, seq, quals: \
SeqRecord(Seq(seq, IUPAC.ambiguous_dna), id=str(id), description='')
that ignores the quality and returns a fasta-style record. That way you would avoid possibly hiding a dependency on quality scores.
Usually when I use pyvcf
I flatten the VCFRecord object into a simple dictionary like this, because I prefer that interface. So if instantiating a VCFRecord object or whatever it's called is too difficult we can always do that.
I will take a look at the freebayes VCF header and see what we can generate from that.
Closed in #5
Generate fasta records as well as VCF records
I think we can refactor the factory function that generates fastq records such that it will generate either fasta or fastq(possibly any format?)
Related vdbwrair/bioframework#19