Closed stevehadd closed 3 years ago
This does happen, but the cruise number itself is not used in classification, and we know that everything with number "0" is not actually from the same cruise, so this shouldn't be causing too much of problem, it may just be reducing performance for one split. A future issue will be looking at all of the possible sources of reduced performance, including this essentially "UKNOWN" cruise, so I'm closing this issue.
When running cross-validation, one of the cruise ID based splits had much worse results than the other. It was thought that this split might contain all profiles for which the cruise label is "UNKNOWN" or 0, penalising these results. We should check for unknown cruise, and split all the profile evenly among the 5 splits.