Closed kdaily closed 4 years ago
I don't have access to syn11817859
Hi @philerooski, where does this stand?
I'll have some time later today to implement the changes described in the September 24th email chain.
At least one object has Live_Cells = T, Lost_Tracking = F for some timepoint, the next timepoint is NA for both values, then the next few timepoints after that are also Live_Cells = T, Lost_Tracking = F until supposedly the cell dies and the rest of the time points are missing.
# A tibble: 25 x 6
Experiment TimePoint ObjectTrackID Well Live_Cells Lost_Tracking
<chr> <int> <int> <chr> <chr> <chr>
1 AB-CS47iTDP-Survival 0 7 A1 true false
2 AB-CS47iTDP-Survival 1 7 A1 NA NA
3 AB-CS47iTDP-Survival 2 7 A1 true false
4 AB-CS47iTDP-Survival 3 7 A1 true false
5 AB-CS47iTDP-Survival 4 7 A1 true false
6 AB-CS47iTDP-Survival 5 7 A1 true false
7 AB-CS47iTDP-Survival 6 7 A1 true false
8 AB-CS47iTDP-Survival 7 7 A1 true false
9 AB-CS47iTDP-Survival 8 7 A1 NA NA
10 AB-CS47iTDP-Survival 9 7 A1 NA NA
# ... with 15 more rows
What to do here? Sometimes the gap in data is larger than a single timepoint.
Another strange case. Lost_Tracking is labeled as true
but all the timepoints are present. Maybe the Lost_Tracking label at timepoint 7 is incorrect?
Experiment ObjectTrackID Well TimePoint Live_Cells Lost_Tracking
1 LINCS062016B 38 A11 1 true false
2 LINCS062016B 38 A11 2 true false
3 LINCS062016B 38 A11 3 true false
4 LINCS062016B 38 A11 4 true false
5 LINCS062016B 38 A11 5 true false
6 LINCS062016B 38 A11 6 true false
7 LINCS062016B 38 A11 7 true true
8 LINCS062016B 38 A11 8 false false
9 LINCS062016B 38 A11 9 false false
10 LINCS062016B 38 A11 10 false false
11 LINCS062016B 38 A11 11 false false
12 LINCS062016B 38 A11 12 false false
13 LINCS062016B 38 A11 13 false false
14 LINCS062016B 38 A11 14 false false
Another confusing example of this from the same well/experiment:
Experiment ObjectTrackID Well TimePoint Live_Cells Lost_Tracking
1 LINCS062016B 58 A11 1 true false
2 LINCS062016B 58 A11 2 true false
3 LINCS062016B 58 A11 3 true false
4 LINCS062016B 58 A11 4 true false
5 LINCS062016B 58 A11 5 true false
6 LINCS062016B 58 A11 6 true true
7 LINCS062016B 58 A11 7 false false
Seems to me to be a data munging error caused by the cells death.
@philerooski the first example seems to be a manual error. The other two @jaslinkalra is still looking into.
Can we determine a filter or rule to identify anything else that looks like these?
Lost_Tracking = true
). I think I actually have the code to do the first using a run length encoding strategy - I will commit what I have here and highlight it for you. Live_Cells
and Lost_Tracking
switching from both true
to both false
. Not trivial but doable?I'm going to handcomb through this table of different Live_Cell/Lost_Tracking combinations and create separate issues for each type of anomolous record I come across.
Live_Cells Lost_Tracking previous_live_cells previous_lost_tracking
1 FALSE FALSE NA NA
2 NA NA FALSE FALSE
3 NA NA NA NA
4 TRUE TRUE NA NA
5 NA NA TRUE TRUE
6 TRUE FALSE NA NA
7 TRUE FALSE TRUE FALSE
8 NA NA TRUE FALSE
9 TRUE TRUE TRUE FALSE
10 FALSE FALSE FALSE FALSE
11 TRUE TRUE TRUE TRUE
12 TRUE FALSE FALSE FALSE
13 FALSE FALSE TRUE FALSE
14 FALSE FALSE TRUE TRUE
15 FALSE TRUE FALSE FALSE
16 NA NA FALSE TRUE
17 FALSE TRUE NA NA
18 FALSE FALSE FALSE TRUE
19 TRUE FALSE TRUE TRUE
20 FALSE NA FALSE FALSE
21 NA NA FALSE NA
22 TRUE NA TRUE FALSE
23 NA NA TRUE NA
24 FALSE NA NA NA
Is this still an issue that I can resolve? Are there more cases like this where the curation logic failed?
I was working on a script to fix but the logic is incomplete https://github.com/philerooski/neurolincsdreamchallenge/blob/fix-lost-tracking/R/fix_lost_tracking.R
Maybe it's time to make this issue a priority again.
Okay, I suggest we update the manual errors I provided as csv file first as I found lost tracking cases similar to this issue here stemming from manual errors in reporting correct object labels found. Let me know if I need to update the curation table in synapse.
Note to self (correct me if I'm wrong): Update the relevant rows in this
https://www.synapse.org/#!Synapse:syn11378063/tables/
with this
https://www.synapse.org/#!Synapse:syn18134075
before fixing Lost_Tracking labels as described above.
Fixed by #17
Current data (https://www.synapse.org/#!Synapse:syn11378063/tables/) has missing time points per Experiment + Well + Object. The
Lost_Tracking
column is inappropriately used to indicate that the next timepoint (and subsequent ones until a manually curated track comes back) is lost.We need to transform this to:
Lost_Tracking
where it'sTrue
toFalse
, and the missing time points getLost_Tracking
=True
.Task:
Make a new table with these changes in R - do not modify existing table.