Improve some details in preprocessing pipeline

ahfoss / kamilaStreamingHadoop

k-means and KAMILA algorithms written for MyHadoop on a SLURM batch scheduler

GNU General Public License v3.0

0 stars 0 forks source link

Improve some details in preprocessing pipeline #13

Closed ahfoss closed 8 years ago

ahfoss commented 8 years ago

Normalization script in preprocessing pipeline logs the number of lines in the data set, which is then used in the subsampling step. Include all in the slurm file.

Also, only print normalized data to a limited number of decimal places, perhaps ~6 (8?) decimal places.