Edinburgh-Genome-Foundry / DnaWeaver

A route planner for DNA assembly
https://dnaweaver.genomefoundry.org
MIT License
29 stars 9 forks source link

Q: Any specific reason why PcrExtractionStation not using the PartsLibrary? #7

Open andrewshvv opened 3 years ago

andrewshvv commented 3 years ago

Hey, I am using the PcrExtractionStation atm, and have noticed that it uses the blast database, I wonder, are there any architectural reasons why it is not using the PartsLibrary instead?

Such a decision leads to the fact that the PCR matrix does not appear in the assembly_parts field of the report.

Just feedback, maybe someone will be against it, but the ideal behavior from my POV would be to:

Basically, internally in the PartsLibrary blast database is created and used for a search, and if the record is found, the seq record is used instead as an output. In this case, the annotation will be preserved along the way. Also in this case the cost function will be split - PCR extraction cost will include the cost of actual extraction, and PartLibrary might include the cost of delivery if some external provider is used.


On a side note: I am forced to use the blast database because when sequences are uploaded directly in the PCRExraction station it produces invalid PCR extraction operations. I might upload such a case soon.

Zulko commented 3 years ago

Just noticed that the docstring for this class isn't complete, sorry for that, I'll try and complete it this week.

it uses the blast database, I wonder, are there any architectural reasons why it is not using the PartsLibrary instead?

The reason it uses BLAST is that it can suggest PCR extractions from whole genomes (typically, extract an element from E. coli, Yeast, etc.) and from parts of a part (not just the whole part) in a collection of sequences.

Specify the same parts library for PCRExtractionStation as well as GoldenGate DNAAssemblyStation

This should be possible by providing the part sequences as the sequences parameter (which can be provided instead of blast_database). If this didn't work for you or if you are suggesting a more convenient to reuse a parts library as the source of sequences from a PartsLibrary, that makes sense to me, don't hesitate to suggest an MR.

the PCR matrix does not appear in the assembly_parts field of the report.

That sounds like a good idea to me. DnaWeaver is aware of the PCR matrix used (the part name, or the name of the chromosome for extraction from genome), so it should be possible to make that information available in the report. I can't work on it right now but an MR would make sense 👍

the assembly plan contains a matrix seq record. In this case, the annotation will be preserved along the way.

Where would the annotation appear in the end? I think it's important to remember the part that the PCR is using, but including a seqrecord in the PCR station's quote could take more RAM and more CPU. The genbank annotations could be added a-posteriori, using for instance Geneblocks.

PCR extraction cost will include the cost of actual extraction, and PartLibrary might include the cost of delivery if some external provider is used.

There might be ways to model that using 2 PCR extraction stations: one using available parts, and one using ordered parts (which would have a higher fixed cost). The 2 extractors would then be compared via a source comparator to act as a single PCR station

On a side note: I am forced to use the blast database because when sequences are uploaded directly in the PCRExraction station it produces invalid PCR extraction operations. I might upload such a case soon.

Sorry for that and yes please, provide some examples!