DHI / mikeio

Read, write and manipulate dfs0, dfs1, dfs2, dfs3, dfsu and mesh files.
https://dhi.github.io/mikeio
BSD 3-Clause "New" or "Revised" License
138 stars 55 forks source link

Extracting data from a large .dfs2 file #649

Closed AL89 closed 7 months ago

AL89 commented 7 months ago

Hi there

I don't know if there is a solution to my problem, but I will try to tell my problem anyway.

I have a large DFS2 file containing 1440 time steps with a 1 minute resolution, which adds up to a day. Furthermore the spatial resolution of cells is nx = 401 * ny = 401 with dx = dy = 250. The amount of data is hereby 401 x 401 x 1440 = 231.553.440 cell values and have the size of around 1 GB. Whenever I try to read this with mikeio, this will take around 25 seconds.

import mikeio

dfs2 = mikeio.Dfs2(filename='data.dfs2')
ds = dfs2.read()

Now, I know I want to spatially trim this file, so it works for a smaller region, containing only nx = 20, ny = 43. This will decrease the amount of data and size significantly I and I know a way to do it. I noticed the area parameter in the read(...) method, which let me choose the proper bounding box coordinates.

ds_small = dfs2.read(area=tuple(left,lower,right,upper))

However, "reading" this decreased amount of data (ds_small) takes just as long as "reading" the original file (ds). How come? I thought the specified bounding box corresponds to the reading/loading time? Obviously not.

Despite my disappointment, am I doing it right, or is there another way to decrease reading time?

Thanks in advance.

ecomodeller commented 7 months ago

You are correct. The spatial subsetting feature is not present in the lower level libraries that MIKE IO uses.

So it is expected to take as long to read a subset. 😐


From: AL89 @.> Sent: Tuesday, February 13, 2024 3:55:34 PM To: DHI/mikeio @.> Cc: Subscribed @.***> Subject: [DHI/mikeio] Extracting data from a large .dfs2 file (Issue #649)

Hi there

I don't know if there is a solution to my problem, but I will try to tell my problem anyway.

I have a large DFS2 file containing 1440 time steps with a 1 minute resolution, which adds up to a day. Furthermore the spatial resolution of cells is nx = 401 * ny = 401 with dx = dy = 250. The amount of data is hereby 401 x 401 x 1440 = 231.553.440 cell values and have the size of around 1 GB. Whenever I try to read this with mikeio, this will take around 25 seconds.

import mikeio

dfs2 = mikeio.Dfs2(filename='data.dfs2') ds = dfs2.read()

Now, I know I want to spatially trim this file, so it works for a smaller region, containing only nx = 20, ny = 43. This will decrease the amount of data and size significantly I and I know a way to do it. I noticed the area parameter in the read(...) method, which holds the bounding box coordinates.

ds_small = dfs2.read(area=tuple(left,lower,right,upper))

However, "reading" this decreased amount of data (ds_small) takes just as long as "reading" the original file (ds). How come? I thought the specified bounding box corresponds to the reading/loading time? Obviously not.

Despite my disappointment, am I doing it right, or is there another way to decrease reading time?

Thanks in advance.

— Reply to this email directly, view it on GitHubhttps://github.com/DHI/mikeio/issues/649, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEV6R36BDDX4JP2EA24K5LYTN5GNAVCNFSM6AAAAABDGU3JVOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZTENJRGQYDGMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

AL89 commented 7 months ago

Hi @ecomodeller. Are you saying that the argument area doesn't work as intended?

ecomodeller commented 7 months ago

The area argument allows you to read a subset, also from a file which wouldn't fit in memory, so it works as intended, but it is far from an optimal solution.

The problem is here: https://github.com/DHI/mikeio/blob/877c99699f879682aa2118099b666647c7d4a8ac/mikeio/dfs/_dfs2.py#L243

We can subset items and time, but not space.

This would have to be added in mikecore and actually in the ufs C library.

AL89 commented 7 months ago

Okay, I understand. I am glad, though, that the current solution works and use less memory than loading the original file.

I guess you can close the issue now.