cerndb / dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
http://joerihermans.com/work/distributed-keras/
GNU General Public License v3.0
624 stars 169 forks source link

Ambiguous reference in notebook workflow #19

Closed quang-ha closed 7 years ago

quang-ha commented 7 years ago

While running the workflow notebook, running cell 12 present the following error:

AnalysisException: u"Reference 'label' is ambiguous, could be: label#395, label#399, label#400.;"

Is there a work-around for the problem?

JoeriHermans commented 7 years ago

Did you run this cell multiple times? That would explain the duplicated columns. A workaround would be to select the original columns before executing that particular cell again.

quang-ha commented 7 years ago

It fails even on single run. Sorry for beginner's question, but which original column?

JoeriHermans commented 7 years ago

Hmmm, I really need to check this workflow notebook. I would recommend starting with the MNIST notebook. This one is pretty complete. If you need additional information on what is actually happening in the background, I recommend you to read the first chapter of my master thesis https://github.com/JoeriHermans/master-thesis/blob/master/thesis/master_thesis_joeri_hermans.pdf

SoulGuedria commented 6 years ago

I'm facing the same issue. Any suggestion ? Thanks

dbl001 commented 6 years ago

'label' is duplicated. Try changing 'label' to 'label_output' and adjust all references.

dataset = dataset.select("features_normalized", "label_index", "label_output")

# Show the expected output vectors of the neural network.
dataset.select("label_index", "label_output").take(1)