Decision trees are often overfit in scikit learn because trimming isn't implemented. One of the reasons it isn't, is because more powerful and robust predictive ensemble methods exist (random forest/ gradient boosting). However, it's actually really useful for helping to understand the data. There are a few implementations of trimming in other ML packages/languages (I think they are more common in R). This would take some effort, but it might be a more interesting problem to tackle.
Decision trees are often overfit in scikit learn because trimming isn't implemented. One of the reasons it isn't, is because more powerful and robust predictive ensemble methods exist (random forest/ gradient boosting). However, it's actually really useful for helping to understand the data. There are a few implementations of trimming in other ML packages/languages (I think they are more common in R). This would take some effort, but it might be a more interesting problem to tackle.