IntelLabs / HPAT.jl

High Performance Analytics Toolkit (HPAT) is a Julia-based framework for big data analytics on clusters.
BSD 2-Clause "Simplified" License
120 stars 16 forks source link

Cannot process file larger than memory size #20

Open rafaelcarv opened 7 years ago

rafaelcarv commented 7 years ago

Hello,

I created a csv file with 119GB and a HDF5 file with 50GB (number of instances 6.6 Billions), they was generated with the generate_1d_array.jl program from the HPAT generate folder.

First i tried with the following settings: 1 Google Compute Engine VM with: Ubuntu 16.04 52GB Memory 2TB Storage 8vCPUs

The error occurred, but not with a smaller file (with 2 Billions instances, using the same program to create).

I though that was an error with any configuration, or maybe a conflict with the mpi installed (the instance had 2 different versions of mpi installed).

So I created another VM 1 Google Compute Engine VM with: Ubuntu 14.04 52GB Memory 1TB Storage 8 vCPUs

The same error occurred with the same parameters.

I am executing with the following line: mpirun -np 8 julia .julia/v0.5/HPAT/examples/1D_sum.jl --file=1D_large.hdf5

error.txt

ehsantn commented 7 years ago

HPAT doesn't support out-of-core computation currently, so the data has to fit in the cluster's memory. I recommend using more nodes.

Also, if the installed MPI I/O is compiled with 32 bit integer, the number of elements per core in each I/O operation cannot be more than 2.1 billion. Using more nodes solves this issue too.