Deliverables 2/13 - Githubissues

Status	Deliverable	Notes
Complete	Implement CSV outputs from pipeline	Brandon
Complete	Implement hyperparameter for slice-wise mean normalization	Brandon
Complete	Change pipeline target volume from .2 microns^3 to 1 micron^3, and generate result comparison	Brandon
Complete	Output annotations on tiff	Brandon
~~incomplete~~	~~Use NDIO to solve upload problems~~	~~Richard~~
Complete	read and present i2g paper by roncal (focus on evaluation)	Richard
Complete	Algorithms MD on volume segmentation	Will
Complete	draft detailed plan for pipeline quality assurance	Will/Richard

@gkiar here are our group's issues for this week

http://nbviewer.jupyter.org/github/NeuroDataDesign/pan-synapse/blob/master/background/meanNorm.ipynb http://nbviewer.jupyter.org/github/NeuroDataDesign/pan-synapse/blob/master/background/targetVolChange.ipynb https://github.com/NeuroDataDesign/pan-synapse/blob/master/background/CSV_Demo.ipynb https://raw.githubusercontent.com/NeuroDataDesign/pan-synapse/master/background/localOut.csv

https://github.com/NeuroDataDesign/pan-synapse/blob/master/background/precisionrecall.ipynb https://github.com/NeuroDataDesign/pan-synapse/blob/master/code/tests/quality.py https://github.com/NeuroDataDesign/pan-synapse/blob/master/background/ConnectedComponents_Algorithms.md.ipynb https://github.com/NeuroDataDesign/pan-synapse/blob/master/background/Investigating_Scipy_Measurements_Library.ipynb

This should be written in a markdown somewhere, not just on an issue. Please move to a file on github and link to it.

On Feb 12, 2017 11:26 PM, "Will LeVine" notifications@github.com wrote:

Pipeline Quality Assurance:

Check for adherence to average synaptic volume of ~ 138.89 voxels (approximately ~1 cubic micron per synapse converts to ~ 138.89 voxels. This is a metric given to us by Richard)

Check for adherence to accurate volumetric synaptic densities (the ratio of total synaptic volume to total image volume should be somewhere near 3-5%. This is also a metric given to us by Richard)

Check for synaptic uniformity across the entire volume in all 3 dimension. To do so, we could slice the image into its individual z-slices, check if the ratio of synaptic area to image slice area is consistent across all z-slices (at the absolute minimum, ensuring that no slice's ratio is more than double that of another; or at the very least, ensuring that there are no apparent trends), and then repeat for the x and y axes. The purpose of this step is to ensure that our pipeline works equally well across differently-shadowed regions of the volume.

Precision and Recall for center-containment with ground-truth. After we run the volume through our pipeline, we could take our calculates centroids and check if each detected synapse's centroid is contained in an actual ground-truth synapse. If so, it is considered a true positive. If not, it is considered a false positive. After this process is done, we could then find the synapses in the ground truth which were not cross-referenced with a centroid from our pipeline and call those false negatives. Using this information, we could then calculate our precision and recall.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NeuroDataDesign/pan-synapse/issues/46#issuecomment-279291992, or mute the thread https://github.com/notifications/unsubscribe-auth/AEqDWCwKv3W0e9GP6AqWMRPzA8Lc-HpCks5rb9uGgaJpZM4L41qA .

Also, since I'm seeing this prior to you presenting it in class, I'm not convinced you really did more work than just type this since our meeting on Thursday. I think we agreed you'd show me pseudocode for precision recall and a plan for ground truthing synapses and using it.

On Feb 13, 2017 7:32 AM, "Greg Kiar" gkiar07@gmail.com wrote:

This should be written in a markdown somewhere, not just on an issue. Please move to a file on github and link to it.

On Feb 12, 2017 11:26 PM, "Will LeVine" notifications@github.com wrote:

Pipeline Quality Assurance:

Check for adherence to average synaptic volume of ~ 138.89 voxels (approximately ~1 cubic micron per synapse converts to ~ 138.89 voxels. This is a metric given to us by Richard)

Check for adherence to accurate volumetric synaptic densities (the ratio of total synaptic volume to total image volume should be somewhere near 3-5%. This is also a metric given to us by Richard)

Check for synaptic uniformity across the entire volume in all 3 dimension. To do so, we could slice the image into its individual z-slices, check if the ratio of synaptic area to image slice area is consistent across all z-slices (at the absolute minimum, ensuring that no slice's ratio is more than double that of another; or at the very least, ensuring that there are no apparent trends), and then repeat for the x and y axes. The purpose of this step is to ensure that our pipeline works equally well across differently-shadowed regions of the volume.

Precision and Recall for center-containment with ground-truth. After we run the volume through our pipeline, we could take our calculates centroids and check if each detected synapse's centroid is contained in an actual ground-truth synapse. If so, it is considered a true positive. If not, it is considered a false positive. After this process is done, we could then find the synapses in the ground truth which were not cross-referenced with a centroid from our pipeline and call those false negatives. Using this information, we could then calculate our precision and recall.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NeuroDataDesign/pan-synapse/issues/46#issuecomment-279291992, or mute the thread https://github.com/notifications/unsubscribe-auth/AEqDWCwKv3W0e9GP6AqWMRPzA8Lc-HpCks5rb9uGgaJpZM4L41qA .

Hey Greg, presenting the I2G paper/precision and recall fell under my deliverable. I did make a short powerpoint that I was going to use to explain the I2G paper and precision/recall. I will link this below. I was not aware you wanted me to write the pseudocode for the 2 equations, but I can present that in a Jupyter notebook today before class.

I believe this was mentioned in class, but we cannot use precision/recall currently to evaluate our pipeline because we do not have ground truth and currently have no way of determining ground truth. The most we have are certain metrics that Richard Roth gave us which are the only quality assurances we have for our pipeline (which is what Will built the QA plan with).

Because precision/recall are not very useful in our situation, I have also began looking at ways to evaluate pipelines without ground truth. I will be sharing some papers I've found during class but so far my findings haven't been good.

On a side note, Will has been spending most of his time investigating why our connected components is taking forever to run (his first deliverable) because there isn't much we can do with quality assurance as of now.

https://docs.google.com/presentation/d/1QYUIzlkHNsRUGoHu1YwXgzvvhYsDe0KNAWYnabhayzc/edit#slide=id.g35f391192_00

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2702211/

http://lmb.informatik.uni-freiburg.de/people/bahlmann/data/ko_si_al_ba_gr_organsegmentation2012.pdf

Ah, right. We also discussed coming up with a plan to think about getting ground truth from Richard on some subset of volume so that we can better assess our pipeline.

There was plenty to do for QA, no? We came up with 3 or 4 ways to do it, regardless of ground truth, but to my knowledge none of them have been implemented?

-- Greg Kiar gkiar07@gmail.com

On Mon, Feb 13, 2017 at 10:08 AM, Richard Guo notifications@github.com wrote:

Hey Greg, presenting the I2G paper/precision and recall fell under my deliverable. I did make a short powerpoint that I was going to use to explain the I2G paper and precision/recall. I will link this below. I was not aware you wanted me to write the pseudocode for the 2 equations, but I can present that in a Jupyter notebook today before class.

I believe this was mentioned in class, but we cannot use precision/recall currently to evaluate our pipeline because we do not have ground truth and currently have no way of determining ground truth. The most we have are certain metrics that Richard Roth gave us which are the only quality assurances we have for our pipeline (which is what Will built the QA plan with).

Because precision/recall are not very useful in our situation, I have also began looking at ways to evaluate pipelines without ground truth. I will be sharing some papers I've found during class but so far my findings haven't been good.

On a side note, Will has been spending most of his time investigating why our connected components is taking forever to run (his first deliverable) because there isn't much we can do with quality assurance as of now.

https://docs.google.com/presentation/d/1QYUIzlkHNsRUGoHu1YwXgzvvhYsDe 0KNAWYnabhayzc/edit#slide=id.g35f391192_00

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2702211/

http://lmb.informatik.uni-freiburg.de/people/bahlmann/data/ko_si_al_ba_gr_ organsegmentation2012.pdf

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NeuroDataDesign/pan-synapse/issues/46#issuecomment-279418099, or mute the thread https://github.com/notifications/unsubscribe-auth/AEqDWCnivWewdhFYHbUMM0Mv6uGoOO4Dks5rcHHbgaJpZM4L41qA .

I think the ways we came up with included checking that we had the correct percentage of synapses and the correct volume size. I believe we have already used both to evaluate our current pipeline. We presented those metrics to Richard in our meeting with him 2 weeks ago. We also visualized our results so Richard could "look at it" and tell us if we were on the correct track.

I'm sorry I must have missed that part in our meeting about getting ground truth from Richard. I can work on that this week.

NeuroDataDesign / pan-synapse-f16s17

Deliverables 2/13 #46