2017-09-19 TODO - Githubissues

boxiangliu commented 7 years ago

To summarize our discussion from yesterday, here are the TODO items: 1) use a 2 convolutional layers. 2) stratify magnitude and variance to sample held-out genes. 3) hold out i) random and ii) entire series of stress conditions and iii) entire genes. 4) use i) dinucleotide shuffle and ii) whole-promoter shuffle to test whether trained model has learned regulatory motifs. 5) use zero-inflated loss. One form could be: $(1-L_C)||P(1) R||^2 + L_C ||P(1) R-L_R||^2$ where L_C is classification label and L_R is regression label. When L_C is 0 (baseline), the second term goes to 0, and we want P(1) to be large. When L_C is 1 (up- or down-regulated), the first term goes to 0 and we want P(0) to be close to 0. The network would have three outputs, each representing baseline, up/down-regulated, and regression output. 6) reduce the number of filter to ~50 on the first layer to avoid memorization. 7) augment the promoter set with dinucleotide shuffle to force motif learning. 8) use delta deepLIFT to detect motif-motif and motif-gene interactions. 9) Use Gorrila GO to compare predicted and actual pathways.

Feel free to add/correct items by commenting below.

AvantiShri commented 7 years ago

If I recall correctly, entire-gene holdout should be added to (3).

For (5), I think to make it more like the square loss, you should square the magnitudes. BTW, If you want to avoid writing a new loss function, then one thing you can do is just apply the standard square loss to (1 - P(0))R but modify the labels such that the "true" value is 0 for non-differentially-regulated genes and the original value for differentially-regulated genes.

boxiangliu commented 7 years ago

Hi @AvantiShri sorry I just saw this. I am not sure why I don't get email notifications.

entire gene holdout will be added to (3).

in (5), the norms here are 2-norms, but I should have made them more clear.

boxiangliu / DeepRegret

2017-09-19 TODO #1