cangermueller / deepcpg

Deep neural networks for predicting CpG methylation
MIT License
143 stars 66 forks source link

Error filter #16

Closed frankmarab closed 6 years ago

frankmarab commented 6 years ago

Hello,

I am using deepcpg to with continuous methylation values from library obtained with a single-cell protocol, just with low-input DNA, not single cell. Model training, evaluation and test run smoothly, as well as the calculation of the activating filters. However, at the motif analysis and visualization when I run the following:

dcpg_filter_motifs.py activations.h5 --out_dir outDir --plot_heat --plot_dens --plot_pca --out_format pdf --verbose

I get this error:

INFO (2018-02-13 21:33:40,554): Reading data Traceback (most recent call last): File "/home/f/framar/pfs/METH_CSF/deepcpg-1.0.5/scripts/dcpg_filter_motifs.py", line 619, in App().run(sys.argv) File "/home/f/framar/pfs/METH_CSF/deepcpg-1.0.5/scripts/dcpg_filter_motifs.py", line 312, in run return self.main(name, opts) File "/home/f/framar/pfs/METH_CSF/deepcpg-1.0.5/scripts/dcpg_filter_motifs.py", line 458, in main assert filters_weights.shape[1] == 1 AssertionError

Any possible solution?

Regards

Francesco

cangermueller commented 6 years ago

Hi Francesco, which Keras and Tensorflow version are you using?

frankmarab commented 6 years ago

Hello Christof

I am using: Keras/2.0.8 Tensorflow/1.3.0

Also, since I am using continuous methylation values, how are the prediction performance metrics calculated? If trying to understand it from the code, it seems that the continuous values are binarized (0/1) and then binary metric are used comparing the prediction and the original value. Is that correct?

cangermueller commented 6 years ago

Hi Francesco,

the problem was caused by a change of the shape of convolutional filter weights. I fixed it and released DeepCpG 1.0.6, which you can find out PyPI and Github. Let me know if it still does not work after you updated your installed version.

If your data is contains continuous methylation states, they will used as continuous input values for the CpG module. For training, however, target labels are binarized by rounding and the model is trained to predict methylation probabilities (rates) between 0 and 1 using the binary cross-entropy as loss function. Prediction performance is evaluated using binary classification metrics (accuracy, AUC, ..).

frankmarab commented 6 years ago

Thanks! Will this affect only model evaluation and the extraction of the sequences activating the filters? Or should I re-train the models?

cangermueller commented 6 years ago

You do not need to re-train your model. Just try running dcpg_filter_motifs.py after you updated.