BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
164 stars 42 forks source link

Annotation lost in pcr with circular template #262

Open manulera opened 1 week ago

manulera commented 1 week ago

Related to #136, can be reproduced with the same files found in there (reproduce.zip).

When using the pcr module in this case, all the annotations of the vector are lost, unless the vector is shifted so that no feature spans the origin. In the example below, I print the number of features resulting from doing the pcr on the shifted sequence (gives 27), and not shifted (gives 1)

from pydna.design import assembly_fragments, primer_design
from pydna.parsers import parse
from pydna.dseqrecord import Dseqrecord
from pydna.amplicon import Amplicon
from pydna.amplify import pcr

# Read the plasmid
vector: Dseqrecord = parse('pREP42-MCS+.gb')[0]

# Read the gene we want to clone
genomic_dna: Dseqrecord = parse('ase1.gb')[0]

# Select the ase1 CDS from the feature
ase1_feature = next(
    f for f in genomic_dna.features if (f.type == 'CDS' and 'gene' in f.qualifiers and 'ase1' in f.qualifiers['gene'])
)

# Select the entire plasmid, shifted so that the end of the promoter is at the origin
promoter_feature = next(
    f
    for f in vector.features
    if f.type == 'promoter' and 'label' in f.qualifiers and 'nmt1 P41 promoter' in f.qualifiers['label']
)
vector_shifted = vector.shifted(promoter_feature.location.end)

# Design primers (without overhangs)
vector_amplicon_pre = primer_design(vector_shifted)
# here we use indexing instead of feature extract to keep the features
ase1_amplicon_pre = primer_design(genomic_dna[ase1_feature.location.start : ase1_feature.location.end])

# Design assembly primers (same as previous, but with overhangs for Gibson assembly)
# We include the vector twice to create a circular recombination
asm: list[Amplicon] = assembly_fragments([vector_amplicon_pre, ase1_amplicon_pre, vector_amplicon_pre])

# This keeps the features
vector_amplicon = pcr(asm[2].primers()[0], asm[0].primers()[1], vector_shifted)
print(len(vector_amplicon.features))

# This should give the same result but it doesn't, probably a bug
# in PCR
vector_amplicon = pcr(asm[2].primers()[0], asm[0].primers()[1], vector)
print(len(vector_amplicon.features))
manulera commented 1 week ago

@BjornFJohansson this error is likely caused by something similar to #136, so it highlights the value of having a single implementation for PCR, assemblies, etc. that properly handles this tricky cases of annotation transmission in circular molecules.