the data warehousing project are usually "beyond just meeting the project requirement" to get good mark. It is recommended to use more than 2 approach to arrive to the same result. In this case using Python the entirety, using Weka for the data analytics. We might also want to use SSIS for data profiling along with Pandas Profiling.
Data Cleaning and Analysis
do data profile
and clean stuff
Association rule mining
Attribute Selection / Data Reduction ?
selection of subset of attributes, use all sorts of different methods (ANOVA variable selection, PCA). Possibly other methods that we still dont know
Attribute Selection and Data Reduction is very related. It might be worth doing these analysis
All attributes
Selected attributes (by what we think is correct)
ANOVA Variable selections
PCA
Classification
in python we could use what is called a "pipeline", and gridsearch to determine the optimal parameters to the algorithms. This is very interesting, and I would be happy to layout the analysis for this
Clustering
This is probably similar to classification but we have to compare the difference between classification and clustering result.
Other comments
There are so many ways that these can be done, and although it is a great idea to do quite a few with comparison. It is understandable that we might limit ourselves with some tools.
I believe it's better to do a comparison by different methods (eg. PCA, selected attribute, all attributes) than by different tools (eg. Python, WEKA)
General strategy
Data Cleaning and Analysis
Association rule mining
Attribute Selection / Data Reduction ?
Attribute Selection
andData Reduction
is very related. It might be worth doing these analysisClassification
Clustering
Other comments
There are so many ways that these can be done, and although it is a great idea to do quite a few with comparison. It is understandable that we might limit ourselves with some tools.
I believe it's better to do a comparison by different methods (eg. PCA, selected attribute, all attributes) than by different tools (eg. Python, WEKA)