cmaron / CS-7641-assignments

CS 7641 - All the code
MIT License
158 stars 124 forks source link

assignment 3: clarify if any charts are overwritten at any point in time, and disconnect NN part from part 1 #16

Open swyxio opened 5 years ago

swyxio commented 5 years ago

hey chad, love the work as usual!

just wanted to comment from my experience having used this for a3. combining the neural network code with the clustering code made for very slow first runs and made it hard to iterate on just the clustering section. i'd think about breaking it up.

here is another student's usage plan:

If you are having trouble understanding Chad's code like me, here is a strategy I am planning to use. 

For each part in assignment 3, below are the options I think need to be used to generate appropriate results. My plan is to run each part and then rename the output folder to res_part_#. This way I can group(cluster ;)) output plots and CSVs based on assignment parts

* Run the clustering algorithms on the datasets and describe what you see.
run_experiment.py --benchmark --threads -1 --plot

* Apply the dimensionality reduction algorithms to the two datasets and describe what you see.

run_experiment.py --ica --pca --rf --rp --threads -1 --plot

* Reproduce your clustering experiments, but on the data after you’ve run dimensionality reduction on it.

Duplicate the output folder with new name but don't delete the original. This part will need to use it. 

run_experiment.py --skiprerun --dim X --ica  --threads -1 --plot

run_experiment.py --skiprerun --dim X --pca --threads -1 --plot

run_experiment.py --skiprerun --dim X --rf    --threads -1 --plot

run_experiment.py --skiprerun --dim X --rp   --threads -1 --plot

* Apply the dimensionality reduction algorithms to one of your datasets from assignment #1 (if you’ve reused the datasets from assignment #1 to do experiments 1-3 above then you’ve already done this) and rerun your neural network learner on the newly projected data.

Duplicate the output folder with new name but don't delete the original. This part will need to use it. 

run_experiment.py --dataset1--skiprerun --dim X --ica  --threads -1 --plot

run_experiment.py --dataset1 --skiprerun --dim X --pca --threads -1 --plot

run_experiment.py --dataset1 --skiprerun --dim X --rf    --threads -1 --plot

run_experiment.py --dataset1 --skiprerun --dim X --rp   --threads -1 --plot

* Apply the clustering algorithms to the same dataset to which you just applied the dimensionality reduction algorithms (you’ve probably already done this), treating the clusters as if they were new features. In other words, treat the clustering algorithms as if they were dimensionality reduction algorithms. Again, rerun your neural network learner on the newly projected data.
run_experiment.py --dataset1 --benchmark --threads -1 --plot

dim X needs to be determined from previous parts. I assume Dataset1 is your dataset from assignment 1. I know there are many overlaps but my runtimes are not too bad so I'll go with this.

i didnt have any sense that there might be overwriting of results until i read this, which is a fairly important detail to note in the readme if it is true. :( https://github.com/cmaron/CS-7641-assignments/tree/master/assignment3#overall-flow