Which assembly products are returned depends on the input order

manulera commented 9 months ago

Hi @BjornFJohansson this is what we discussed the other day that I could not explain clearly. Here is an example. Basically, when calling Assembly.assemble_linear, the assemblies that are returned are only the ones that start from the first fragment in either orientation, and finish with the last fragment in either orientation. See minimal example below where the same inputs are provided, but their order is changed:

# Current pydna assembly implementation
from pydna import assembly
from pydna.dseqrecord import Dseqrecord

fragments = [
    Dseqrecord('aaaTCGATGGGaaa', id='f_1'),
    Dseqrecord('ccTCGATGGGcccCTCTCATAcc', id='f_2'),
    Dseqrecord('ggCTCTCATAggg', id='f_3'),
]

print('Old implementation, original order')
asm = assembly.Assembly(fragments, limit=8)
for output in asm.assemble_linear():
    print(output.seq)
print()

print('Old implementation, change order')
# Change the order, now fragment f_1 is last
asm = assembly.Assembly(fragments[1:] + fragments[:1], limit=8)
for output in asm.assemble_linear():
    print(output.seq)
print()

This prints

Old implementation, original order
aaaTCGATGGGcccCTCTCATAggg

Old implementation, change order
ggTATGAGAGgggCCCATCGAttt < This is (f_2 inverted + f_1 inverted)
ccTCGATGGGaaa < This is f_2 + f_1

As you can see, it only returns assemblies that start from the first fragment in either orientation and finish with the last fragment, even when the first result is a subassembly of f_1 + f_2 + f_3.

Instead, the new implementation ignores the order of inputs for linear assemblies, and returns always the same output. See how all possibilities are returned.

To reproduce the old behaviour and pass most old tests, I introduced the parameter use_fragment_order. If you agree, I think this can be removed after the merge (I will fix the tests).

import assembly2

print('New implementation, original order')
# New implementation
asm = assembly2.Assembly(fragments, limit=8, use_fragment_order=False)
for output in asm.assemble_linear():
    print(output.seq)

print()
print('New implementation, change order')
asm = assembly2.Assembly(fragments[1:] + fragments[:1], limit=8, use_fragment_order=False)
for output in asm.assemble_linear():
    print(output.seq)

print()

print('New implementation, original order, start from first')
# To reproduce the old behavior, just set use_fragment_order=True
asm = assembly2.Assembly(fragments, limit=8, use_fragment_order=True)
for output in asm.assemble_linear():
    print(output.seq)
print()

print('New implementation, change order, start from first')
asm = assembly2.Assembly(fragments[1:] + fragments[:1], limit=8, use_fragment_order=True)
for output in asm.assemble_linear():
    print(output.seq)

This prints

New implementation, original order
aaaTCGATGGGcccCTCTCATAggg < f_1 + f_2 + f_3
ccTCGATGGGaaa < f_2 + f_1
ggCTCTCATAcc < f_3 + f_2

New implementation, change order
aaaTCGATGGGcccCTCTCATAggg
ccTCGATGGGaaa
ggCTCTCATAcc

New implementation, original order, start from first
aaaTCGATGGGcccCTCTCATAggg

New implementation, change order, start from first
ccTCGATGGGaaa
ggTATGAGAGgggCCCATCGAttt

cc @hiyama341 @JamesBagley since they might be interested

BjornFJohansson commented 8 months ago

yes! I see this was done!

manulera commented 8 months ago

To reproduce the old behaviour and pass most old tests, I introduced the parameter use_fragment_order. If you agree, I think this can be removed after the merge (I will fix the tests).

Should i then remove this behaviour after the merge?

BjornFJohansson commented 8 months ago

Yes, I think that would be better.

manulera commented 8 months ago

Ok, then I re-open the issue and when I implement it I will close it

BjornFJohansson / pydna

Which assembly products are returned depends on the input order #192