bthuronyi / CloneCoordinate

CloneCoordinate issue tracking
1 stars 0 forks source link

Rewrite "GG donor, only needs coverage between cut sites?" as "Alignment and interpretation guidelines" #106

Closed bthuronyi closed 2 months ago

bthuronyi commented 3 months ago

Sequencing tab has a column called "GG donor, only needs coverage between cut sites?" that should be rewritten as "Alignment and interpretation guidelines", i.e. suggestions for how to align the read based on what part(s) of the construct seem to need sequencing, per the Registry. For example, it might output: "This is a Golden Gate donor and must be sequence verified from the 5' BsaI ATGG cut site through the 3' BsaI GCCT cut site". Or "This was produced via Golden Gate, so all 8 parts should be present but do not need internal sequence coverage."

bthuronyi commented 3 months ago

Check if construct included any d's (aID or gID w/d's) and say they need sequencing; give parts list rIDs and count for gIDs

evelynqi commented 3 months ago

Note: Implemented "Alignment and interpretation guidelines" tab on Sequencing sheet.

Not sure what the second part means.

evelynqi commented 3 months ago

Discussed with B: -Count golden gate parts and list what they are (from golden gate sheet rID) -5' and 3' overhangs for donors

evelynqi commented 3 months ago

Some labs use PCR products as donors for the golden gate and that needs to be fully sequenced -- bigger coding project?

evelynqi commented 3 months ago

Update: -Message now updates correctly for golden gate donors (5' and 3' overhangs)

Still needs: -Golden gate constructs

evelynqi commented 2 months ago

https://docs.google.com/spreadsheets/d/1YSqAdAVy6jYu_-Nbnop_aQOGMjjLDXzNKm5-Ur23Pcc/edit?usp=sharing Sequencing (AU - Alignment and interpretation guidelines)

Update: -For Golden Gate constructs, it now give parts list rIDs and count for gIDs! And gives message that "Sequencing needed to verify parts are present but not needed for internal sequence coverage." -For assemblies using PCR products, it gives parts list dIDs and counts for dsDNA parts! And gives message that "Needs to be fully sequenced verified." -For constructs that are neither gg donors, gg constructs, or assemblies, what should the interpretation be? For our lab, there are a few of these that were sequenced and these were direct receipts.. but I'm not sure this is the same for all labs (generalization). I could leave it blank. Right now, it outputs "dinosaur" for ease of searching.

evelynqi commented 2 months ago

YES- I also do know that my counta is set up differently when counting Golden Gate parts versus Assembly parts. I don't really know why but it works...

evelynqi commented 2 months ago

B notes:

to

let(output,bycol(index('Assemblies a'!P:W, xmatch($D4,USER_u_ID)),lambda(col,ifs(

Runs faster because xmatch stops when it finds a match while filter goes through all of them


-Instead of looking at registry (BF annd BE for their IDs since a construct may use both), look at the left hand side of source type so if the assembly is a23 or g25


-add enzyme name to donor interpretation


-thoughts still needed for what to replace "dinosaur" with

evelynqi commented 2 months ago

Update: -works except for golden gate constructs (xmatch arguments are incorrect)

evelynqi commented 2 months ago

-everything else is implemented

bthuronyi commented 2 months ago

-For constructs that are neither gg donors, gg constructs, or assemblies, what should the interpretation be? For our lab, there are a few of these that were sequenced and these were direct receipts.. but I'm not sure this is the same for all labs (generalization). I could leave it blank. Right now, it outputs "dinosaur" for ease of searching.

Let's first check the "queue notes" field; if it's not blank, then put "Queue notes say: " & queue notes. If it is blank, I recommend "Look for full sequence coverage and/or consult [initials for person who queued the sequencing, pulled from that column] for guidance"

bthuronyi commented 2 months ago

Found an issue to correct: some plasmids that are produced by Golden Gate (gID) are also GG donors. In that case, being produced by GG should take precedence, but right now it's the other way around. See e.g. row 954.

bthuronyi commented 2 months ago

dIDs are not handled - for these, let's do:

  1. Check the "queue notes" field; if it's not blank, then put "Queue notes say: " & queue notes
  2. If queue notes are blank, "Consult [initials for person who queued the sequencing] for guidance"
evelynqi commented 2 months ago

Implemented √

Errors when searching for whether a construct is a golden gate donor or not when the construct is not in the registry. Can use iferror but am wondering if there is a better way.

evelynqi commented 2 months ago

Implemented and ready to check: Sequencing AU - Alignment and interpretation guidelines

bthuronyi commented 2 months ago

Looks good, go for it in main CC! I might also tweak the column placement (between "alignment" and "interpretation" - we just need to adjust protected ranges after moving it) and set it to only evaluate if the read is aligned but not interpreted yet, to save calculation time. I can take care of those things.

evelynqi commented 2 months ago

CHANGED: Rewrote "GG donor, only needs coverage between cut sites?" as "Alignment and interpretation guidelines"

=if(C3="","",index(Registry_Golden_Gate_donor_query,xmatch(C3,Registry_Construct_name)))

to

=ifs($C3="","", left($D3)="g", "This is a Golden Gate construct with "& counta(index('Golden Gate g'!DW:EF, xmatch($D3,Golden_Gate_g_ID))) - countblank(index('Golden Gate g'!DW:EF, xmatch($D3,Golden_Gate_g_ID))) & " parts (" & trim(join(", ",let(output,bycol(index('Golden Gate g'!DW:EF, xmatch($D3,Golden_Gate_g_ID)),lambda(col,ifs( col="","", TRUE,col))), filter(output, output<>"")))) & "). Sequencing needed to verify parts are present but not needed for internal sequence coverage.", iferror(index(Registry_Golden_Gate_donor_query,xmatch($C3,Registry_Construct_name))="yes",FALSE), "This is a Golden Gate donor and must be sequence verified from the 5'" &index(Registry_GG_enzyme,xmatch($C3,Registry_Construct_name)) & " " &index(Registry_5_overhang,xmatch($C3,Registry_Construct_name))& " overhang through the 3' "&index(Registry_3_overhang,xmatch($C3,Registry_Construct_name)) &" overhang.", left($D3)="a", "This is an assembly using "& counta(index('Assemblies a'!P:W, xmatch($D3,USER_u_ID))) & " PCR products (" & trim(join(", ",let(output,bycol(index('Assemblies a'!P:W, xmatch($D3,USER_u_ID)),lambda(col,ifs( col="","", TRUE,col))), filter(output, output<>"")))) & "). Needs to be fully sequenced verified.", left($D3) ="d", if($M3 <>"", "Queue notes say:" & $M3, "Consult "& $L3 &" for guidance."), TRUE, if($M3 <>"", "Queue notes say:" & $M3, "Consult "& $L3 &" for guidance.") )