JonathanChiang / eDash

Open Source Clinical Interface for Medical Imaging
MIT License
2 stars 0 forks source link

Paper Proposal #4

Open JonathanChiang opened 5 years ago

JonathanChiang commented 5 years ago

Overview

The Course Project is an opportunity for you to apply what you have learned in class to a problem of your interest. Potential projects usually fall into these two tracks:

For applications, this type of projects would involve careful data preparation, an appropriate loss function, details of training and cross-validation and good test set evaluations and model comparisons. Don't be afraid to think outside of the box.

JonathanChiang commented 5 years ago

Project Proposal

The project proposal should be one paragraph (200-400 words).

Deepneuro has proven to help design and train neural network architectures, as well as modify state-of-the-art architectures in a flexible and intuitive way.

Another exploratory application is apply federated or model serialization to ensure consistent performance of networks across variable users, institutions, and scanners across different institutions to promote interoperability and patient confidentiality.

This introduces an IT implementation and engineering solution that could enable training models across sites and institutions. This additionally introduces deep learning coding design patterns for security across institutions. `


operating characteristic curve (AUC) and precision, recall, and
F1 scores (these metrics are defined in detail in Appendix E1
[online]). These measurements were computed by using the
scikit-learn (version 0.19) Python library and were reported for
the test set in each model. Interrater agreement was assessed with
the Cohen k statistic and was also computed by using scikitlearn software. The DeLong nonparametric statistical test (24)
implemented in the Daim (version 1.1.0) R library was used to
assess statistical differences among AUC values, wherein P values
less than .05 were considered to indicate a significant difference.
The method described by Hanley and McNeil (25) was used to
compute 95% confidence intervals for AUC values.
To enable accessible analysis of CNN model results, we created class activation maps (CAMs) that show the areas of a given
image that are most responsible for its CNN classification (26).
The CAMs we used were slightly different from those used by
Zhou et al (26) because of the sigmoid nonlinearity on the final
layer```
JonathanChiang commented 5 years ago

Final Report:

Your final write-up is required to be between 6 - 8 pages using the provided template, structured like a paper from a computer vision conference (CVPR, ECCV, ICCV, etc.). Please use this template so we can fairly judge all student projects without worrying about altered font sizes, margins, etc. After the class, we will post all the final reports online so that you can read about each others' work. If you do not want your writeup to be posted online, then please let us know at least a week in advance of the final writeup submission deadline.

The following is a suggested structure for your report, as well as the rubric that we will follow when evaluating reports. You don't necessarily have to organize your report using these sections in this order, but that would likely be a good starting point for most projects.

` Title, Author(s)

Abstract:

Related Work (10%):

Data (10%):

Methods (30%):

Experiments (30%):

Conclusion (5%)

Writing / Formatting (5%)

Source code (if your project proposed an algorithm, or code that is relevant and important for your project.)

Cool videos, interactive visualizations, demos, etc. Examples of things to not put in your supplementary material:

JonathanChiang commented 5 years ago

https://jdunnmon.github.io/dunnmon_radiology_2018.pdf

JonathanChiang commented 5 years ago

Development Size and Initialization Method Test Accuracy Precision Recall F1 Score AUC Value

JonathanChiang commented 5 years ago

Figure 3: Comparison of, A, receiver operating characteristic (ROC) curves for DenseNet-121 (NN) and NN+PL (mean of NN score and prospective label [PL] score) classifiers and, B, area under the ROC curve (AUC) histograms obtained from a 1000-sample test set by using the bootstrap method. Each ROC curve represents the output of one representative NN model. In B, solid lines indicate mean values, and dashed lines indicate standard deviation from the mean. Data set size (K = 1000 points) refers to total size (training + development, 90-to-10 split).

JonathanChiang commented 5 years ago

Deep Learning for Triage of Chest Radiographs: Should Every Institution Train Its Own System?

JonathanChiang commented 5 years ago

Pysyft Scope: Note that we'll not just be talking about how to decentralized / encrypt data, but we'll be addressing how PySyft can be used to help decentralize the entire ecosystem around data, even including the Databases where data is stored and queried, and the neural models which are used to extract information from data. As new extensions to PySyft are created, these notebooks will be extended with new tutorials to explain the new functionality.

JonathanChiang commented 5 years ago

https://arxiv.org/pdf/1811.04017.pdf!

We have introduced a privacy preserving federated learning framework built over PyTorch. The
design relies on chains of tensors that are exchanged between local and remote workers. Our tensor
implementations support commands of the PyTorch API and combine MPC and DP functionalities
within the same framework
JonathanChiang commented 5 years ago

SyftTensors are meant to represent a state or transformation of the data and can be chained together. The chain structure always has at its head the PyTorch tensor, and the transformations or states embodied by the SyftTensors are accessed downward using the child attribute and upward using the parent attribute

JonathanChiang commented 5 years ago
Our main contributions are the following:
- We first build a standardized protocol to communicate between workers which made federated
learning possible.
- Then, we develop a chain abstraction model on tensors to efficiently override operations (or encode
new ones) such as sending/sharing a tensor between workers.
- Last, we provide the elements to implement recently proposed differential privacy and multiparty
computation protocols using this new framework.
By doing so, we intend to help popularize privacy preserving techniques in machine learning by
making them available via the common tools that researchers and data scientists work with on a daily
basis. Our framework is designed in a extensible way such that new FL, MPC, or DP methods can be
plugged in by external contributors willing to make their work available to the wider deep learning
community.
JonathanChiang commented 5 years ago

https://blog.openmined.org/upgrade-to-federated-learning-in-10-lines/

JonathanChiang commented 5 years ago

We detail a new framework for privacy preserving deep learning and discuss its assets. The framework puts a premium on ownership and secure processing of data and introduces a valuable representation based on chains of commands and tensors. This abstraction allows one to implement complex privacy preserving constructs such as Federated Learning, Secure Multiparty Computation, and Differential Privacy while still exposing a familiar deep learning API to the end-user. We report early results on the Boston Housing and Pima Indian Diabetes datasets. While the privacy features apart from Differential Privacy do not impact the prediction accuracy, the current implementation of the framework introduces a significant overhead in performance, which will be addressed at a later stage of the development. We believe this work is an important milestone introducing the first reliable, general framework for privacy preserving deep learning.