ShervanGharari / EASYMORE

EASYMORE; EArth SYstem MOdeling REmapper
GNU General Public License v3.0
21 stars 21 forks source link

Issues handling large dataset #26

Open chrimerss opened 2 years ago

chrimerss commented 2 years ago

Hello,

I am recently using EASYMORE to map some large dataset, for instance lat/lon dimension 3500/7000. Current version raises errors when writing shp file larger than ~4Gb. One workaround would be using geopandas to write geopackage instead of pyshp. It would be great to address it if others like me are interested in macro-scale hydrologic modeling.

I forked this repo and modified some of your codes:

  1. writing a large dataset (>4Gb)
  2. replace some for-loops with numba acceleration (for example method lat_lon_SHP contains double for-loops).
  3. Implemented multiprocessing to fully utilize cpu in server (only tested for case 1 and 2).
  4. use a configuration file to handle model inputs

Hope this is helpful :) you can find my fork here: https://github.com/chrimerss/EASYMORE/tree/main/easymore

ShervanGharari commented 2 years ago

Greetings,

Thank you very much for using EASYMORE and for your feedback.

To better understand your changes, and in case you are interested to directly contribute to the code, I suggest addressing the issues raised here one by one. I believe we can handle 1 and 2 in one pull request. Would that be possible for you to create such a pull request?

Also, I have added contribution steps in the develop branch. Basically, we prefer to have pull requests to the develop branch first and merge the develop branch to the main branch after some substantial improvement.