Amyris / GslCore

Core library and basic plug-ins for the Amyris Genotype Specification Language (GSL) compiler.
Apache License 2.0
21 stars 9 forks source link

FEA: representation for flanking DNA during construction #27

Open daz10000 opened 5 years ago

daz10000 commented 5 years ago

Flanking DNA

GSL typically models just the designed DNA (e.g. locus, promoter gene terminator constructs), but during the process of construction it is very common to package the design into a larger piece of DNA e.g. typically a plasmid construct, at which point there are flanking DNA sequences upstream and downstream of the design. We wish to provide some system for describing the richer representation, without completely breaking the separation between design and implementation.

Proposal

As an example implementation, the user could create a GSL function that takes a design as input, and returns a more elaborate design with the flanking sequences. We would need a mechanism for flagging to the compiler that this function is needed for packaging design. It's not uncommon to have multiple packaging systems, so we would also need a mechanism at the design level for selecting the packaging function. Finally, output generators (that care) would need to decide whether to generate the abstract form of the DNA, the packaged form of the DNA or optionally both. This might be a compiler level decision, in which case we could somewhat abstract the problem away from the individual generators by making two passes through the output generation.

Example

let myPackager(assembly) =
    /ATGATGCTAGTCGTACGTAGTCAGT/ ; &assembly ; /TGATCGTACGTAGTCGTACGTACGTA/
end

#packager myPackager

uFOO5; oERG10 ; dFOO5
daz10000 commented 5 years ago

Update - have been thinking about this since writing the issue and am fairly convinced this is more of a build issue than a design issue. It would be nice still to have a convention for marking preferred packaging and checking it's a valid request. Could reserve the pragma #package or #vector and allow compiler to validate it, and just include the hint in any output build that goes into a pipeline, so it can represent both the design and the preferred package. The build pipeline could observe or override the request and construct both the insert (design) and full vector at the same time, or represent it according to local build pipeline conventions. This enables all sorts of downstream processes like NGSQC to show full vector with details etc.