Investigate why by year splitting reduces performance (possible bug?)

When using the sample_feature_values function to select cruises to use for final validation, it was thought a good idea to sample a fraction from each year, to ensure we good representation in the train and test sets from each year. This seems to cause dramaticx reduction inperformance. I think this means there is a bug, because there is a lot of custom code beyond standard pandas and scikit learn, so it seems likely that is causing the spplit to happen incorrectly and so poor results.

The line is experiment.py

ensemble_unseen_cruise_numbers = self.xbt_labelled.sample_feature_values(self.unseen_feature, fraction=self.ens_unseen_fraction, split_feature='year')

currently we have removed the split_feature argument until we can fix the bug.

MetOffice / XBTs_classification

Investigate why by year splitting reduces performance (possible bug?) #69