fulcrumgenomics / prymer

Python Primer Design Library
https://prymer.readthedocs.io/en/latest/
MIT License
8 stars 0 forks source link

`OffTargetDetector` should relax the constraint that each primer pair have at least one hit in the reference #43

Closed msto closed 2 weeks ago

msto commented 2 weeks ago

Adapted from our Slack discussion:

Context

I am running into an issue with OffTargetDetector, and I am not certain whether it is a bug or an oddity of my use case.

As I've mentioned, I am designing primer pairs to amplify the junctions of a donor integration into a host genome.

Upstream of the primer design, I have assembled the expected integration sequence, consisting of the linearized donor flanked by 500 bp of host genome on either side of the expected insertion site.

           [========= 500 bp host sequence =======][~~~~~~~~~~ linearized donor ~~~~~~~~~~][========= 500 bp host sequence =======]

primer3 uses a FASTA containing this assembled sequence to design primers, and build_primer_pairs() uses it to construct the amplicon sequence.

However, when searching for off-target hits, I am searching the complete reference genome, which does not include the donor. This is fine when searching for off-target hits of individual primers, but I am encountering an issue when evaluating primer pairs.

Specifically, OffTargetDetector requires that there be at least one amplicon among the pairs' hits, i.e. that each primer have at least one hit in the reference. This (ideally :slightly_smiling_face: ) does not happen when one of the primers is designed against a sequence that is not present in the reference.

https://github.com/fulcrumgenomics/prymer/blob/70f684086feb5f204e1f8fd659e7b7443cd4e845/prymer/offtarget/offtarget_detector.py#L273

I think it would be appropriate to relax the constraint to 0 <= len(amplicons). It makes sense to me that the off-target detection could be done against an arbitrary reference