To enable parallel processing, the input csv files have to be split. The script (python, R, java, bash, awk) takes 1 csv file as an input and splits it into a given number of csv output files and stores them in a given folder:
csvsplit input.csv 200 outputdir/
Would generate outputdir/0.csv outputdir/1.csv ... outputdir/199.csv.
Every file has to start with the same header line as the one in the input file.
To enable parallel processing, the input csv files have to be split. The script (python, R, java, bash, awk) takes 1 csv file as an input and splits it into a given number of csv output files and stores them in a given folder:
csvsplit input.csv 200 outputdir/
Would generate outputdir/0.csv outputdir/1.csv ... outputdir/199.csv. Every file has to start with the same header line as the one in the input file.