BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
166 stars 45 forks source link

Pydna documentation #244

Closed manulera closed 1 month ago

manulera commented 3 months ago

cc @BjornFJohansson @hiyama341 @dgruano.

@JeffXiePL is going to work on the pydna documentation in the next weeks, and I made a list of what I think should be covered. The idea is to have in the style of a cookbook (how to achieve a task) rather than library documentation (what every class method does, etc.). I know there is a bit of that in the cookbook folder, but we would like to cover a bit more.

Below is the link of the guidelines for the documentation, feel free to edit / add things within reason for @JeffXiePL to cover.

https://docs.google.com/document/d/19sRRAMIHqn0rg-oHSdqIR6DxTIHYo2uj15nRdjq8D5Q/edit?usp=drive_link

JeffXiePL commented 2 months ago

Hi all,

I have a quick question on the pydna Dseqrecord page: is there no built-in method to remove a feature from a, say .gb file? Is the best way of going about it to use list comprehension?

Thanks!

manulera commented 2 months ago

Hi there @JeffXiePL, the list comprehension where you use an if statement is probably the best way to filter a list. There are similar ways, but they are not better

from Bio.SeqRecord import SeqRecord
from Bio.SeqFeature import SeqFeature, SimpleLocation

# We create a seqrecord with two features
f1 = SeqFeature(SimpleLocation(1, 5), type='CDS', id='f1')
f2 = SeqFeature(SimpleLocation(8, 15), type='misc_feature', id='f2')
seqr = SeqRecord('AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', features=[f1, f2])

# We filter out a feature
seqr.features = [f for f in seqr.features if f.type != 'misc_feature']

print(seqr.features)
JeffXiePL commented 2 months ago

Hello all,

I wanted to ask why is there two Contig objects after using assembly_circ on an Assembly object? I couldn't find much details on the documentations.

Peilun Xie

manulera commented 2 months ago

Hi @JeffXiePL, I think you probably mean assemble_circular.

A Contig is a subclass of Dseqrecord with some extra methods that allow you to see how it was assembled. When you call assemble_circular, in principle you will be getting all possible circular assemblies that can be produced given the algorithm that you passed as Contigs. A set of fragments may be assembled in different ways. If you share an example you don't understand I can explain a bit better.

Note however that the current implementation sometimes gives unexpected results, given how the possible assemblies are computed. This will be fixed once I merge the new implementation.

In the example below, where the homology region of a gibson assembly ACGTAATG appears in several fragments, assemble_circular returns 4 contigs each representing a fragment circularised, in forward and reverse orientation. All this to say that if you are getting results that you think don't make sense, it may be because of that. In any case, feel free to share an example.

from pydna.assembly import Assembly
from pydna.dseqrecord import Dseqrecord

a = Dseqrecord("ACGTAATGaccACGTAATG")
b = Dseqrecord("ACGTAATGcgcACGTAATG")

assembly = Assembly((a, b), limit=8)

for out in assembly.assemble_circular():
    print(out.seq)

More info on what gives this behaviour (no need to go into it, but putting here for documentation purposes)

https://github.com/BjornFJohansson/pydna/issues/166 https://github.com/BjornFJohansson/pydna/issues/200 https://github.com/BjornFJohansson/pydna/issues/192