Use c40 rows to fill NA rows. In order to do that, we need to join the two datasets together (2018_-_Cities_WaterActions + 2018-_Cities_Water_Risks), so that to increase the feature space. Duplicate rows need to be summed up, after text vectorization. I propose either the word2vec or Glove approach, for columns that include descriptions (after having removed stopwords). Also IDFT may come in handy, if we see that the pretrained models show significant discrepancy
After imputation, we need to visualize data and make sure that features are correlated with the labels. If not, then we need to iterate for weak rows, assuming they are NA, until we have managed to create a dataset that is coherent.
In fact the only thing we need to keep from this dataset is country, coordinates and the column risks_to_city_s_water_supply, all the others are required for the imputation
Working with @antosalerno