Pydna documentation Gibson Assembly Literature Example

JeffXiePL commented 2 months ago

Hello everyone,

I'm a student of Manu's and I'm writing some documentation for pydna. I'm now writing documentation for the gibson assembly page, and I had been recommended to ask for some useful example from literature that I can include. Would you all have any recommendations?

Peilun

hiyama341 commented 2 months ago

Hi Peilun,

In case you were interested in the in vivo assembly method, which is gaining a lot of traction, especially in the fungal genetic engineering world, you can check out this paper.

Another cool paper that relies on this in vivo assembly method is this one-pot assembly in yeast, which I think is super nice. It might be harder to re-create this cause they make +7000 strains but you could make a small example if you like the paper

As we discussed in the meeting it is not Gibson per se but similar.

Let me know if you have any questions!

Lucas

manulera commented 2 months ago

Hi @JeffXiePL I had a look this morning since I am making some changes to SYC, and was trying to find an example from the literature. Unsurprisingly I had a look at several and I could not manage to reproduce any of the cloning!

Anyway, I made something up that I think can be a good example. Cloning a gene from pombe into an expression vector. If you are still looking for a full example, this can be a good one. If you already found something else, you can use that.

You can find the files to reproduce it and the code in this comment

However, there is an error in the annotation of ase1 due to #262 and #136. The sequence is correct though, so keep a warning in the comments.

JeffXiePL commented 2 months ago

Hello Manu, sorry that it took a while for me to reply. I have also tried to give an example at my notebook in my latest push, from the original Gibson Assembly paper, but I could not get it to work.

Two of the primers that the original authors gave in the supplementary information section here would not anneal to the plasmid. I tried searching manually in the plasmid for a matching sequence that would anneal to the primers given by the authors, but I couldn't find it. It would be great if you could help me take a look at my notebook here for the error. I'll reproduce the pombe example if this doesn't work.

manulera commented 2 months ago

Hey @JeffXiePL, first of all good job navigating the sparse documentation in the paper and finding the right sequences, I had already had a look and given up as I mentioned in the issue. You were almost there!

The reason why the code did not work is because you were setting a limit that was too high for the PCR on pcr_product_BAC. You used 69, which would have been too long in any case, since the primers are only 68 in length.

In any case, it would not have worked with 68 either, because these primers do not entirely align, they have overhangs. For instance, the primer AACGATCCTGGTACACCTTGTTTGCAGGACTTGAAGCTGCgcggccgcgatcctctagagtcgacctg has three "parts".

AACGATCCTGGTACACCTTGTTTGCAGGACTTGAAGCTGC: homology arm for the assembly
gcggccgc: NotI cut site
gatcctctagagtcgacctg: the part that actually binds to the sequence for amplification.

By setting the limit=20, it should work. In general, it's not a good idea to use a limit over 20, since you might omit unexpected products that would be generated in the actual experiments. In your previous example I told you to use the higher limit because otherwise it would not run, but the reason was that the primers were poorly chosen and would generate multiple PCR products.

Let me know if you manage to get it to work like that! Some extra improvements that you can make:

Use an annotated sequence for the plasmid: You can visit the addgene record of this plasmid and download its sequence with annotations as a gb file: https://www.addgene.org/62862/ Go to "View all sequences" and then select "GenBank".
Use an annotated genome: This can be a bit trickier to find, but I would it like this:
- Go the ncbi page: https://www.ncbi.nlm.nih.gov/
- In all databases, select Assembly and search for the organism
- You will end up here: https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=1521
- Select the reference assembly (in this case there is only one, but other organisms have multiple).
- You should be in the assembly page: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000022065.1/
- Click on "Download" and select
- "All"
- "Sequence and annotation"
- You will get a zip file, inside the folder data you will find two folders GCA_000022065.1 and GCF_000022065.1. There are the same genome, but with annotations from RefSeq and GenBank. You can use either one. In principle they should be the same, but I have encounted cases where they are different.
- phew, that was a long one! 😅
Since the meaning of the limit parameter might be ambiguous for newcomers, it's good to document its behaviour in the documentation that you made.

Again good work at going through the paper documentation, and I hope this increases your awareness about how much tools like pydna and SYC are needed. Even in method papers about cloning with hundreds of citations it can be hard to find your way!

JeffXiePL commented 2 months ago

Again thank you so much for the detailed explanation! I think it's important too to explain the 'limit' parameter, that confused me a lot. I'll make updates to the notebook in the Hackthon!

manulera commented 1 month ago

Hi @JeffXiePL I have fixed the notebook in one of the last commits: https://github.com/BjornFJohansson/pydna/blob/3a96929d2572d57ce10e0ec8ed59d2eacce358dc/docs/notebooks/Example_Gibson.ipynb

You can see that in addition, I am getting the genome directly from genbank instead of having to include the fasta. That's quite handy!

BjornFJohansson / pydna

Pydna documentation Gibson Assembly Literature Example #255