MP-Gadget / bigfile

A reproducible massively parallel IO library for hierarchical data
BSD 2-Clause "Simplified" License
18 stars 6 forks source link

Writing into single bigfile from multiple ranks #41

Closed boryanah closed 3 years ago

boryanah commented 3 years ago

Hi!

I am using nbodykit to manipulate some data and then writing it with the following command:

density = density_fourier.paint(mode='real')
ArrayMesh(density,BoxSize=Lbox).save("density.bigfile", mode='real', dataset='Field')

into a bigfile.

The version of my bigfile module was 0.1.49 and nbodykit is 0.3.15.

I saw that there was a mention of the feature that allows writing into a single file from multiple ranks in the README.md, so I thought that perhaps I just need to update my bigfile, so I installed from scratch, specifying a local directory: cmake -DCMAKE_INSTALL_PREFIX:PATH=/home/boryanah/installs/ .. which installed correctly.

However, I get the following error when calling:

>>> bigfile.File
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'bigfile' has no attribute 'File'

Cheers, Boryana

rainwoodman commented 3 years ago

CMake install is the library and header for C/C++ applications, and excludes the Python extension.

The python extension is installed via the usual pip install . way -- it bundles the bigfile library into the python extension.

boryanah commented 3 years ago

Thanks for this, the installation worked!

The issue that when I save a file from multiple ranks with: density = density_fourier.paint(mode='real') ArrayMesh(density,BoxSize=Lbox).save("density.bigfile", mode='real', dataset='Field') unfortunately still remains. When I look at the file density.bigfile, it is not complete and it seems like all ranks overwrite the same file "density.bigfile".

Is this an issue with bigfile or perhaps I need to manipulate the data differently with nbodykit? ArrayMesh in nbodykit has a note that says "The in-memory array must be fully hosted by the root rank". Is there a way to directly write out in a parallelized fashion the object "density" into a bigfile file without invoking the ArrayMesh module?

boryanah commented 3 years ago

Hi, I just wanted to give an update and close the issue. I ended up changing ArrayMesh for FieldMesh, which does allow proper rank-distribution! Thanks a lot for your help!

rainwoodman commented 3 years ago

Glad you found a way. Yes FieldMesh sounds about right, especially if you are using a RealField or ComplexField object created by pmesh.

ArrayMesh is a bit tricky -- the doc claims the data must be feed from the root rank, suggesting it will scatter the array from root to the other ranks: https://nbodykit.readthedocs.io/en/latest/api/_autosummary/nbodykit.source.mesh.array.html#module-nbodykit.source.mesh.array .

So here we've just found another place in nbodykit where data (the input data to ArrayMesh) is not parallelly distributed.

On Thu, Dec 10, 2020 at 10:01 PM boryanah notifications@github.com wrote:

Closed #41 https://github.com/rainwoodman/bigfile/issues/41.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rainwoodman/bigfile/issues/41#event-4098629027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBWTE3SPPL4F7DQ7BHIULSUGYVLANCNFSM4UVPIOMQ .