Unwanted case-sensitivity when indexing fragment against source record

bwvogler commented 1 year ago

in StickyEndFragment.list_from_record_digestion (line 142), add upper() conversion, changing from existing index = record.seq.find(fragment) to fixed index = record.seq.upper().find(fragment)

Current implementation is broken for source records with lowercase Seq. If there is a mandate somewhere where these should be uppercase, then instead add a useful error report here.

veghp commented 1 year ago

Thank you, yes I have started suspecting recently that there may be an issue when upper/lowercase is mixed in the sequence.

I'll have a look.

veghp commented 1 year ago

I had a look and found that in certain cases when the sequence has both upper/lowercase we have an error, and that the above solution fixes it.

The problem for the record: when the part is all uppercase or all lowercase then it works fine. If the part is mixed, and the lowercase is outside the insert region it works fine (except for receptor). In all other mixed cases (whether it's uppercase in the overhang, or inside the fragment), we have an error.

(Note: files saved in Snapgene as 'Genbank standard' are automatically turned into lowercase. Biopython and DNA Cauldron imports Genbank sequences as uppercase, and exports simulated constructs in lowercase.)

Edinburgh-Genome-Foundry / DnaCauldron

Unwanted case-sensitivity when indexing fragment against source record #20