PermafrostDiscoveryGateway / viz-staging

PDG Visualization staging pipeline
Apache License 2.0
2 stars 1 forks source link

Modularize the pdgstaging library #7

Closed robyngit closed 1 year ago

robyngit commented 1 year ago

To eventually support more flexible workflows, the viz-staging package first needs to be broken down into smaller, more flexible classes and functions.

robyngit commented 1 year ago

Changes made thus far while working on this issue have resulted in some performance improvements. The new Grid class takes advantage of the grid's uniform structure to make faster versions of GeoPanda's overlay and sjoin methods. I replaced the GeoPandas.overlay and GeoPandas.sjoin methods in the Tile class with the Grid equivalents. In a small test of staging 15 IWP files, the new methods resulted in staging times that were almost 10% faster overall, and the gains in speed were relative to the filesize (i.e. the new methods saved the most time for the largest files).

Since a 10% improvement in performance might help with our current work on processing the IWP dataset, I've merged all of the changes made at this point into the develop branch. These changes DO NOT impact the API. In other words, everything should run as it did before, except with better staging times (FYI: @KastanDay)

robyngit commented 1 year ago

Just met with Chunli & others about her ArcticDEM change detection data, which provides another use case to think about while working on making this workflow more modular and flexible

Here are some details about the data & requirements:

robyngit commented 1 year ago

The new Grid class can be used independently of the staging step, making the library more modular than before. We can open up new issues that layout the specific tasks to be completed if we decide more modularization is needed.