Sage-Bionetworks / neurolincsdreamchallenge

1 stars 2 forks source link

Identify erroneous object labels. #7

Closed philerooski closed 5 years ago

philerooski commented 5 years ago

From #6

We spot-checked one example from KS-AB-iMN-TDP43-Survival (Well A3, ObjectTrackID = 6). This has a typo for time T1 (reads 922 instead of 92).

@philerooski, can you scan all of the survival files here (https://www.synapse.org/#!Synapse:syn17079347) to check if the values in columns 'T0' through Tx match something that is in column ObjectLabelsFound? This should identify any human errors when entering a object label.

Originally posted by @kdaily in https://github.com/Sage-Bionetworks/neurolincsdreamchallenge/issues/6#issuecomment-451303336

kdaily commented 5 years ago

@jaslinkalra,

My preliminary analysis using code here identifies 100s-1000s of erroneous values in the survival files. I checked KS-AB-iMN-TDP43 and LINCS092016B - each had many cases. Here's one example:


  Sci_WellID ObjectLabelsFound Phenotype T0  T1  T2  T3  T4  T5  T6  T7  T8  T9 T10 T11 T12 T13 T14 T15 T16 T17 T18
1         B7                71         N 71 130 130 130 130 130 294 321 321 420 420 420 420 420 420 420 420 420 953```
jaslinkalra commented 5 years ago

Is this still an issue after the Dream Challenge talk?

Can we identify some and I look at image files over time for what happened?

kdaily commented 5 years ago

@jaslinkalra I sent an email. Reposting here as well so @philerooski knows what's up.

I did the comparison of the object labels taken directly from the image masks at each time point in each well for each experiment. I found every instance of a record in Jaslin's survival file that didn't match an object in a well at any time point in the image mask.

I did four experiments, and something is amiss with one. Here's the count of potential errors:

  Experiment                   n
  <chr>                    <int>
1 AB-CS47iTDP-Survival      1681
2 KS-AB-iMN-TDP43-Survival     2
3 LINCS062016A                 1
4 LINCS092016B                 2

Here's the list I come up with:

https://www.synapse.org/#!Synapse:syn17937743

kdaily commented 5 years ago

Code for processing this is here:

https://github.com/Sage-Bionetworks/neurolincsscoring/blob/master/exec/survival-file-manual-errors.R

jaslinkalra commented 5 years ago

I have updated censored wells so the errors should be reduced specifically for AB-CS47iTDP-Survival as there were wells that did not get manually curated and the information is not available for curation as well as there is no encoded masks. https://www.synapse.org/#!Synapse:syn11709601/tables/

For the rest of them, how can I know what part is wrong? what information should I provide you guys with?The csv files phil shared last time with double information helped a ton to assess what value was falsely curated for a particular well.

For now, I can check for the three datasets and provide you with information for object tracks, live cells, lost tracking and others.

jaslinkalra commented 5 years ago

The correct curation is attached here- https://www.synapse.org/#!Synapse:syn17937743

kdaily commented 5 years ago

@jaslinkalra I think you meant https://www.synapse.org/#!Synapse:syn18134075!

Thank you for providing.

jaslinkalra commented 5 years ago

Yes you are correct. Sorry about the wrong link.

kdaily commented 5 years ago

Marking resolved, though this file needs to be integrated into the curated table. Will open separate issue.