bthuronyi / CloneCoordinate

CloneCoordinate issue tracking
1 stars 0 forks source link

In Sequencing, Merge "constructs with same marker" into "Interpretation guidelines" #142

Closed bthuronyi closed 3 months ago

bthuronyi commented 3 months ago

Existing columns in Sequencing show which construct(s) have the same resistance as whatever the assembly was plated on, and might therefore be a source of template contamination. Rather than showing this in a separate column, it should be merged with the "interpretation guidelines" formula so that the guidelines give explicit advice relating to them.

The "Alignment classification" field already narrows down when template contamination is likely -- the choice "Read partially, but not completely, matches map (where read quality is good)" is expected for many template contamination cases, since the template at least contains the primer binding site -- so the interpretation guidelines can use this info. If the read completely matches the desired construct map, then we don't have any reason to think template contamination occurred.

An example of an ideal output might be: "pBWT020 [template with same marker; or list multiple options] also confers Km [antibiotic plated on] resistance, and the read alignment doesn't completely match pBWT050 [desired construct]. Start by aligning the read to the pBWT020 sequence and look for an exact match, which would indicate the read is from that template/part, not the assembled construct. If there's no exact match to pBWT020, this may be a misassembly.... [room to add more guidance later to cover other cases]"

Related to #106 and #83

evelynqi commented 3 months ago

https://docs.google.com/spreadsheets/d/1bzumeOTEN57T3ak3pbLISh5hERLXs2nSg6Wq2Fliesw/edit?usp=sharing

Alignment interpretation guidelines(AZ)

Added template contamination to the code; it does happen upstream of the other code, so the other code is not shown when there the alignment classification is field already narrows down when template contamination is likely -- the choice "Read partially, but not completely, matches map (where read quality is good)" and there are templates that confer the same antibiotic resistance.

bthuronyi commented 3 months ago

Nice! Let's have the part that says "Check all the templates which confer the same antibiotic resistance." and the "these constructs" / "this construct" be contingent on the count of constructs that have the same resistance.

It's appropriate to have this override the existing messages because those only really apply when there's an exact match of the read -- any kind of problem means we're not going to get full coverage of the needed parts, and we're just trying to figure out what kind of failure we have.

bthuronyi commented 3 months ago

Once this is settled, you should merge the code that's in AT currently into AZ. That could mean an output like:

Single same-resistance construct: The read alignment doesn't completely match pDRM004. pBT271.290 also confers Cm resistance. Start by aligning the read to the pBT271.290 sequence and look for an exact match, which would indicate the read is from that template/part, not the assembled construct. If there's no exact match, this may be a misassembly.

Multiple same-resistance constructs:

The read alignment doesn't completely match pDRM004. pBT271.290 also confers Cm resistance. Start by aligning the read to the pBT271.290 sequence and look for an exact match, which would indicate the read is from that template/part, not the assembled construct. If pBWT271.290 isn't an exact match, check each of the templates below which confer the same antibiotic resistance. If there's no exact match to any of these constructs, this may be a misassembly.

Other templates with the same marker: pBWT100 pBWT101

evelynqi commented 3 months ago

If there's no exact match to any of these constructs, this may be a misassembly.

Other templates with the same marker: pBWT100 pBWT101

Yes, I thought about doing this but had trouble finding a way to index from the second item to the end of the list. How can I do this?

evelynqi commented 3 months ago

https://docs.google.com/spreadsheets/d/1bzumeOTEN57T3ak3pbLISh5hERLXs2nSg6Wq2Fliesw/edit?usp=sharing

Yesterday: and(AY4="Read partially, but not completely, matches map (where read quality is good)",AT4<>""),"The read alignment doesn't completely match "&C4&". "&join(",",transpose(split(AT4," ")))&" also confers "&index(Registry_Marker,xmatch(C4,Registry_Construct_name))& " resistance. Start by aligning the read to the "& index(transpose(split(AT4," ")),1)&" sequence and look for an exact match, which would indicate the read is from that template/part, not the assembled construct. Check all the templates which confer the same antibiotic resistance. If there's no exact match to these constructs, this may be a misassembly.",

Today's try: This does the two separate messages separately from one another based on whether there are multiple same-resistance constructs or not.

and(AY3="Read partially, but not completely, matches map (where read quality is good)",AT3<>""),**if(counta(split(AT3,char(10)))=1,**"The read alignment doesn't completely match "&C3&". "&AT3&" also confers "&index(Registry_Marker,xmatch(C3,Registry_Construct_name))& " resistance. Start by aligning the read to the "&AT3&" sequence and look for an exact match, which would indicate the read is from that template/part, not the assembled construct. If there's no exact match, this may be a misassembly.","The read alignment doesn't completely match "&C3&". "&join(",",transpose(split(AT3,"
")))&" also confers "&index(Registry_Marker,xmatch(C3,Registry_Construct_name))& " resistance. Start by aligning the read to the "& index(transpose(split(AT3,"
")),1)&" sequence and look for an exact match, which would indicate the read is from that template/part, not the assembled construct. Check all the templates which confer the same antibiotic resistance. If there's no exact match to these constructs, this may be a misassembly."),

Another method: I think this may be preferred: Gives the same first message and appends different ending messages dependent on whether there are multiple same-resistance constructs or not. This is what is coded in my sheet currently.

and(AY3="Read partially, but not completely, matches map (where read quality is good)",AT3<>""),"The read alignment doesn't completely match "&C3&". "&index(transpose(split(AT3,"
")),1)&" also confers "&index(Registry_Marker,xmatch(C3,Registry_Construct_name))& " resistance. Start by aligning the read to the "&index(transpose(split(AT3,"
")),1)&" sequence and look for an exact match, which would indicate the read is from that template/part, not the assembled construct. "&**if(counta(split(AT3,char(10)))=1,**"If there's no exact match, this may be a misassembly.","If "&index(transpose(split(AT3,"
")),1)&" isn't an exact match, check each of the templates below which confer the same antibiotic resistance. If there's no exact match to any of these constructs, this may be a misassembly."&"

Other templates with the same marker: "&AT3),

Test cases in Sequencing (AZ-- one same-resistant construct: m68, multiple same-resistant constructs: m711)

Question about the list at the end of the message if there are multiple same resistant constructs: right now, it appends the list directly from "Assembly components with same marker as construct", which is "fast". The only way I can think about getting the list without the first item is to split the list and filter it to not include that is the first item.. but this seems like a bunch of steps that may slow down the formula for not that much benefit... is there another way to get the list without the first item?

bthuronyi commented 3 months ago

Another method: I think this may be preferred: Gives the same first message and appends different ending messages dependent on whether there are multiple same-resistance constructs or not. This is what is coded in my sheet currently.

Yeah, I tend to prefer this provided the code is fairly similar for the different cases. It's easier to modify, ultimately, because you don't have to change the common code twice and worry that it might actually need to be different between the cases.

bthuronyi commented 3 months ago

The only way I can think about getting the list without the first item is to split the list and filter it to not include that is the first item.. but this seems like a bunch of steps that may slow down the formula for not that much benefit... is there another way to get the list without the first item?

Yeah, so, I HATE this about Sheets, but there is, to my knowledge, no clean and straightforward way to "slice" an array, which is what you want here -- in Python it would be array[1:] and you're done.

You can do filter(array, {FALSE;sequence(rows(array)-1)}) but it's a little gross. (That also works for e.g. "all but the first 2 elements" if you do FALSE;FALSE;sequence... -2.)

bthuronyi commented 3 months ago

You can do filter(array, {FALSE;sequence(rows(array)-1)}) but it's a little gross.

(I came up with this solution just now and it works. Prior to this I actually thought there was no way to do it at all. Go figure!)

bthuronyi commented 3 months ago

All that said, in this situation it would be fine to just give that first construct name one more time, provided there's more than one construct. I think it will not be confusing it all, and you can call the list: "All components that have the same marker:" for clarity.

bthuronyi commented 3 months ago

Couple small things:

bthuronyi commented 3 months ago

Go ahead and implement in main with those changes.

evelynqi commented 3 months ago

Implemented.