[ ] FASTA files were all prepared ad hoc, for human and mouse only, copied to the server, and permissions were handled by hand. Ideally, wrangling
should be integrated into the AssemblyServicecf.FileService and CLI, when creating a new assembly for the current version. One problem, though, is that we need bgzip, and samtools for indexing.
[ ] modification_api.get_genomic_sequence_context needs refactoring, and is currently not really "testable".
[ ] It is currently assumed that pybedtools outputs a FASTA file with a sequence that is in a single line, no matter how long it is, or that the requested sequence fits on the second line. It would be wiser to read this file using a method similar to e.g.
record_dict = SeqIO.index("example.fasta", "fasta")
print(record_dict["gi:12345678"]) # use any record ID
# or
record_dict = SeqIO.to_dict(SeqIO.parse("example.fasta", "fasta"))
print(record_dict["gi:12345678"]) # use any record ID
[ ] Should we allow users to query a different context length? I don't believe this is a must, however we should think about cDNA/transcript context, but this requires careful consideration (affect data model or not?, integrate into data annotation?, etc.), and will be handled in a separate issue in due time.
[ ] Change modification color to primary green (and related docs).
A clear and concise description of todo items.
[ ] FASTA files were all prepared ad hoc, for human and mouse only, copied to the server, and permissions were handled by hand. Ideally, wrangling should be integrated into the
AssemblyService
cf.FileService
and CLI, when creating a new assembly for the current version. One problem, though, is that we need bgzip, and samtools for indexing.[ ]
modification_api.get_genomic_sequence_context
needs refactoring, and is currently not really "testable".[ ] It is currently assumed that pybedtools outputs a FASTA file with a sequence that is in a single line, no matter how long it is, or that the requested sequence fits on the second line. It would be wiser to read this file using a method similar to e.g.
[ ] Should we allow users to query a different context length? I don't believe this is a must, however we should think about cDNA/transcript context, but this requires careful consideration (affect data model or not?, integrate into data annotation?, etc.), and will be handled in a separate issue in due time.
[ ] Change modification color to primary green (and related docs).