DOI-USGS / ds-pipelines-targets-example-wqp

An example targets pipeline for pulling data from the Water Quality Portal (WQP)
Other
10 stars 14 forks source link

Document pattern for saving intermediate files containing the downloaded data #99

Closed lekoenig closed 1 year ago

lekoenig commented 2 years ago

Our current approach is to let targets handle and combine all of the data downloaded within the individual branches of p2_wqp_data_aoi.

For users attempting large-scale pulls (e.g. CONUS extent, many parameters, long temporal extent) this approach might not be ideal and could result in memory allocation errors that vary across local machines. An alternative pattern is to save intermediate files that represent the data downloaded for each group in p2_site_counts_grouped. @lindsayplatt takes this approach in her national chlorophyll-a repo. Saving intermediate files could be useful for some applications, but could also be annoying for a small pull.

My preference is to keep our example implementation as is, but somehow communicate or provide this alternative option. A separate branch with this option would be more difficult to maintain, so we should consider adding a note in the README somewhere under "Customizing the WQP pipeline."