BLE-LTER / BLEdatatools

R package to download and collate BLE LTER data from EDI repository
Other
0 stars 0 forks source link

combine download_data and download_any_ble_data into one function?? alternatively, how to structure download capabilities? #22

Closed atn38 closed 2 years ago

atn38 commented 2 years ago

Combining the two would improve consistency and coherence in our product, plus ease in developing downstream functions. relates to #8

how I imagine the combined function will work:

download_data takes an additional function, tentatively named "zip" and defaults to FALSE.

if zip == FALSE: download_data proceeds as normal and gets the CSVs for collation if users specify IDs not in collation plan, then what? One idea is to give zip files for non-collate IDs and the CSVs/R objects for the rest. IDK.

if zip == TRUE: download_data then becomes download_any_ble_data and just gives the zip files and NO R objects.

@bristol-emily @alinaCO2spera: I'd appreciate your input on this.

twhiteaker commented 2 years ago

If the result is an easily-understood function, then hooray. If the resulting function differs enough in behavior depending on the value of function parameters that it confuses users, then it may be better to have separate functions.

atn38 commented 2 years ago

Thanks for your input @twhiteaker. For context, download_any_ble_data is from #8 and we originally thought it would address a niche use case outside of the collation scope. There has already been some confusion within our group (the people writing this package!) between the two "download" functions, which is the motivation for combining.

bristol-emily commented 2 years ago

Since the download_data function only allows users to download specific core program water or sediment data, I saw a need to develop a more general function that could pull any BLE LTER data (#8). I agree it would make sense to combine these functions. Maybe the default of the download_data function (i.e. when ids = NULL) should provide the core program water/sediment entities (as defined in decide_entities), but the decide_entities function should be revised so it doesn't throw out non-CP water/sed data. This way we can still easily use download_data to get a nested list for the data_collation function, but it can be used to grab any data entity and either save as an R object... or a .csv... or download the whole package in a zip?

It might be confusing to have a function that can output so many different thing. What if downloading entire packages in a zip file is it's own function (e.g. download_zipped_package)? And the download_data function has options to save a data entity/entities as an R object (data frame/nested list) or a .csv/multiple .csvs?

This raises an issue about users needing to specify entity ids, which isn't currently an option in download_data...

atn38 commented 2 years ago

@bristol-emily thanks for your input.

Let's recap all the different download capabilities we have wanted so far:

  1. For collation: 1A. By default, downloads JUST the specified entities from ALL the package Ids slated for collation. IDs fixed, entities fixed. 1B. The user can also specify what package Ids they want, and the function will still download the specified entities. IDs varying, entities fixed.
    1. For general download: 2A. The user gets the entire .zip files from packageIds they specified. IDs varying, entities fixed. 2B. The user gets to pick and choose entities from each packageIds they specified. IDs varying, entities varying.

Looks like we're asking for "IDs varying" a lot more than "entities varying" :) therefore imo "entities varying" is a bit lower priority.

ANYWAY A DIFFERENT IDEA:

We take the collation part --part 1 above-- out of the download function (however we call it) altogether. ONE download function that just downloads --part 2 above. Part 1 will be taken care of inside collate_data. Download and collate will be separate functions that do not feed into one another.

Would this be more conceptually consistent? Less confusing for users?

alinaCO2spera commented 2 years ago

I think this is what you are saying @atn38, and is what I now think is the best solution:

atn38 commented 2 years ago

I think that's what I'm saying too 😅 thanks for your input @alinaCO2spera

Download will be a way to get any and all packages as zipped files?

Plus the option to pick and choose entities, if we have time to work on it...

atn38 commented 2 years ago

ok we're doing this now! download just downloads collate just collates

bristol-emily commented 2 years ago

What should we rename our general/flexible download function as (currently named download_any_ble_data)? The function is currently working (for one package or entity at a time). One renamed, are we ready to merge this function and search_ble_packages function into the development branch?

atn38 commented 2 years ago

@bristol-emily I was just working on moving stuff around, see the commit above. The old download_data is no more, so that's up for grabs. Merging is fine. I'd like the download function to be more robust (more checks for invalid arguments as we discussed yesterday), but that can happen before or after merging, up to you.

atn38 commented 2 years ago

I think we're done here