Refactored part of code based on Google Python Style from Sammy's code.
Added attention mechanism. Since the number of entities for each sentence differs, we might need to implement our own optimizer.
Added entity hidden layer. Need confirmations regarding entity preprocessing details: Should labels be unique? What to do for words in multiple entities?