demetrixbio / GslCore

Core library and basic plug-ins for the Amyris Genotype Specification Language (GSL) compiler.
Apache License 2.0
6 stars 1 forks source link

BUG: duplication of parts in assembly #21

Closed kcurran1 closed 1 year ago

kcurran1 commented 1 year ago

I cloned https://github.com/demetrixbio/Gslc and ran into the following bug when compiling the examples in https://github.com/demetrixbio/Gslc/tree/master/examples :

All parts except for the final part are getting duplicated in the output assembly. For example, I ran the following command and got the below snapgene output:

bin/Gslc/net7.0/Gslc.exe --snapgene . foo examples/simple_megastitch.gsl

image

Note that the primers that are placed at the junction of the repeated parts are not actually primers to fuse those parts together, they are just repeats of the correct primers.

image

I got a similar results with a #platform stitch design:

bin/Gslc/net7.0/Gslc.exe --snapgene . foo examples/simple_promoter_gene_locus.gsl

image

I also tried outputting in .ape or flat file formats to see if the issue was isolated to the snapgene output, but I got the same result in those other outputs.

kcurran1 commented 1 year ago

Just a quick note, I did a bit more playing around and realized that if I add --seamless false to the command, that fixes the issue. For example,

bin/Gslc/net7.0/Gslc.exe --snapgene . foo examples/simple_megastitch.gsl --seamless false

kcurran1 commented 1 year ago

Another update! The --seamless false pragma does remove the issue with the duplications. However, I am now running into an issue where some designs won't actually compile due to a primer validation issue. For example, the command I listed in my last comment using the included example file fails:

image

I am going to play around with linkers and see if that changes the behavior next.

@daz10000

daz10000 commented 1 year ago

I will try to reproduce. Are you just using the simple_megastitch.gsl doc from the example file?

Just realized you mentioned that :)

Can confirm that --seamless true duplicates all those parts and --seamless false generates a primer validation error. That's a pretty lame example doc :( Will investigate.

daz10000 commented 1 year ago

For the --seamless true case, it's the procAssembly routine that's messing up.

The input --verbose is

procAssembly: ===========================================================================================

procAssembly: TOP,  prev( 0)=[]
                    n( 9)=[Linker_MT;uHO;::;pTDH3;::;mERG10;::;dHO;Linker_MT]
                    l( 0)=[]
            slice out( 0)=[]

procAssembly: ===========================================================================================

By the end of the routine we have two copies of everything!

procAssembly: finalOutput(slices n=9): Linker_MT      ; uHO ; uHO   ; pTDH3 ; pTDH3 ; mERG10 ; mERG10 ; dHO ; Linker_MT     

procAssembly: finalOutput(primer n=9): DPP(Linker_MT) ; GAP ; DPP() ; GAP   ; DPP() ; GAP    ; DPP()  ; GAP ; DPP(Linker_MT)

o
daz10000 commented 1 year ago

Notes: - the --seamless false case leaves FUSE slices (that are really palceholders directing linker placement) in during the primer generation phase (which messes it up). The simple fix for this is to stip them out going into the procAssembly routine. That does uncover the second bug #22 - primer validation error.

daz10000 commented 1 year ago

Including the error message here for good measure. The specific problem appears to be that the promoter pTDH3 has a flexible end (left hand side) and the primer generation moves the left hand in a little (truncates the piece) which is fine. The primer spans the uHO ; pTDH3 junction but the full length pTDH3 part is emitted in the assembly (not truncated), so the primer is no longer colinear with the assembly, triggering the validation error. (FWIW the mol bio would all work but the reference genome would be slightly wrong for what was built).

fwd primer validation failure.  Primer TCAATTCTATCTATACTTTAAATGTCTGGGTGAACAGTTTATTC
tail=TCAATTCTATCTATACTTTAAA
head=TGTCTGGGTGAACAGTTTATTC
 does not occur in assembly uHO_pTDH3_mERG10_dHO
TCGCAAGTCCTGTTTCTATGCCTTTCTCTTAGTAATTCACGAAATAAACCTATGGTTTAC
GAAATGATCCACGAAAATCATGTTATTATTTACATCAACATATCGCGAAAATTCATGTCA
TGTCCACATTAACATCATTGCAGAGCAACAATTCATTTTCATAGAGAAATTTGCTACTAT
CACCCACTAGTACTACCATTGGTACCTACTACTTTGAATTGTACTACCGCTGGGCGTTAT
TAGGTGTGAAACCACGAAAAGTTCACCATAACTTCGAATAAAGTCGCGGAAAAAAGTAAA
CAGCTATTGCTACTCAAATGAGGTTTGCAGAAGCTTGTTGAAGCATGATGAAGCGTTCTA
AACGCACTATTCATCATTAAATATTTAAAGCTCATAAAATTGTATTCAATTCCTATTCTA
AATGGCTTTTATTTCTATTACAACTATTAGCTCTAAATCCATATCCTCATAAGCAGCAAT
CAATTCTATCTATACTTTAAAAATAGGGGGCGGGTTACACAGAATATATAACATCGTAGG
TGTCTGGGTGAACAGTTTATTCCTGGCATCCACTAAATATAATGGAGCCCGCTTTTTAAG
CTGGCATCCAGAAAAAAAAAGAATCCCAGCACCAAAATATTGTTTTCTTCACCAACCATC
AGTTCATAGGTCCATTCTCTTAGCGCAACTACAGAGAACAGGGGCACAAACAGGCAAAAA
ACGGGCACAACCTCAATGGAGTGATGCAACCTGCCTGGAGTAAATGATGACACAAGGCAA
TTGACCCACGCATGTATCTATCTCATTTTCTTACACCTTCTATTACCTTCTGCTCTCTCT
GATTTGGAAAAAGCTGAAAAAAAAGGTTGAAACCAGTTCCCTGAAATTATTCCCCTACTT
GACTAATAAGTATATAAAGACGGTAGGTATTGATTGTAATTCTGTAAATCTATTTCTTAA
ACTTCTTAAATTCTACTTTTATAGTTAGTCTTTTTTTTAGTTTTAAAACACCAAGAACTT
AGTTTCGAATAAACACACATAAACAAACAAAATGTCTCAGAACGTTTACATTGTATCGAC
TGCCAGAACCCCAATTGGTTCATTCCAGGGTTCTCTATCCTCCAAGACAGCAGTGGAATT
GGGTGCTGTTGCTTTAAAAGGCGCCTTGGCTAAGGTTCCAGAATTGGATGCATCCAAGGA
TTTTGACGAAATTATTTTTGGTAACGTTCTTTCTGCCAATTTGGGCCAAGCTCCGGCCAG
ACAAGTTGCTTTGGCTGCCGGTTTGAGTAATCATATCGTTGCAAGCACAGTTAACAAGGT
CTGTGCATCCGCTATGAAGGCAATCATTTTGGGTGCTCAATCCATCAAATGTGGTAATGC
TGATGTTGTCGTAGCTGGTGGTTGTGAATCTATGACTAACGCACCATACTACATGCCAGC
AGCCCGTGCGGGTGCCAAATTTGGCCAAACTGTTCTTGTTGATGGTGTCGAAAGAGATGG
GTTGAACGATGCGTACGATGGTCTAGCCATGGGTGTACACGCAGAAAAGTGTGCCCGTGA
TTGGGATATTACTAGAGAACAACAAGACAATTTTGCCATCGAATCCTACCAAAAATCTCA
AAAATCTCAAAAGGAAGGTAAATTCGACAATGAAATTGTACCTGTTACCATTAAGGGATT
TAGAGGTAAGCCTGATACTCAAGTCACGAAGGACGAGGAACCTGCTAGATTACACGTTGA
AAAATTGAGATCTGCAAGGACTGTTTTCCAAAAAGAAAACGGTACTGTTACTGCCGCTAA
CGCTTCTCCAATCAACGATGGTGCTGCAGCCGTCATCTTGGTTTCCGAAAAAGTTTTGAA
GGAAAAGAATTTGAAGCCTTTGGCTATTATCAAAGGTTGGGGTGAGGCCGCTCATCAACC
AGCTGATTTTACATGGGCTCCATCTCTTGCAGTTCCAAAGGCTTTGAAACATGCTGGCAT
CGAAGACATCAATTCTGTTGATTACTTTGAATTCAATGAAGCCTTTTCGGTTGTCGGTTT
GGTGAACACTAAGATTTTGAAGCTAGACCCATCTAAGGTTAATGTATATGGTGGTGCTGT
TGCTCTAGGTCACCCATTGGGTTGTTCTGGTGCTAGAGTGGTTGTTACACTGCTATCCAT
CTTACAGCAAGAAGGAGGTAAGATCGGTGTTGCCGCCATTTGTAATGGTGGTGGTGGTGC
TTCCTCTATTGTCATTGAAAAGATATGATTACGTTCTGCGATTTTCTCATGATCTTTTTC
ATAAAATACATAAATATATAAATGGCTTTATGTATAACAGGCATAATTTAAAGTTTTATT
TGCGATTCATCGTTTTTCAGGTACTCAAACGCTGAGGTGTGCCTTTTGACTTACTTTTCC
GCCTTGGCAAGCTGGCCGGGTGATACTTGCACAAGTTCCACTAATGTGTATATTAGTTTA
AAAAGTTGTATGTAATAAAAGTAAAATTTAATATTTTGGATGAAAAAAACCATTTTTAGA
CTTTTTCTTAACTAGAATGCTGGAGTAGAAATACGCCATCTCAAGATACAAAAAGCGTTA
CCGGCACTGATTTGTTTCAACCAGTATATAGATTATTATTGGGTCTTGATCAACTTTCCT
CAGACATATCAGTAACAGTTATCAAGCTAAATATTTACGCGAAAGAAAAACAAATATTTT
AATTGTGATACTTGTGAATTTTATTTTATTAAGGATACAAAGTTAAGAGAAAACAAAATT
TATATACAATATAAGTAATATTCATATATATGTGATGAATGCAGTCTTAACGAGAAGACA
TGGCCTTGGTGACAACTCTCTTCAAACCAACTTCAGCCTTTCTCAATTCATCAGCAGATG
GGTCTTCGATTTGCAAAGCAGCCAAAGCATCGGACAAAGCAGCTTCAATCTTGGACTTGG
AACCT
daz10000 commented 1 year ago

image

daz10000 commented 1 year ago

THis logic is in proc assembly case 7 (general case). F# makes the likely mistake bloody obvious. The offsetF variable marks how far the part was moved forward, and it's not used (highly suspicious in functional code)

image and further down this gem of a comment

image

daz10000 commented 1 year ago

Fixed in GslCore 0.5.1..