USEPA / harmonize-wq

Standardize, clean, and wrangle Water Quality Portal data into more analytic-ready formats
https://usepa.github.io/harmonize-wq/
MIT License
12 stars 5 forks source link

Review edits #57

Closed jbousquin closed 2 weeks ago

jbousquin commented 5 months ago

List to track progress on reviewer comments. Will add/edit as we go.

Note: several items not on the list have been or are being resolved.

jbousquin commented 5 months ago

These may not get resolved for future features adds:

nitpick 1: TADA_DATA_URL and the URL for domain tables are only used by one function each, but the plan is for additional resources on both to be able to be used by future feature adds. For example new crosswalks with the upcoming transition from WQX2.0 -> 3.0

discussion 1: these were moved around to avoid cyclic imports

todo 2 (basis.py): These additional basis columns haven't received much attention yet. It was coded this was to make it easy to come back to and write additional specific handling. combined weight/time, left particuleSize which has been notes specific to it's handling.

Notes: Suggestion 1: probably, not sure where to implement yet, not a straightforward groupby Suggestion 2: check other modules Suggestion 3: made these dicts, but need to move docs to module level. domains.out_col_lookup(), domains.xy_datum(), domains.stations_rename(), domains.accepted_methods(), basis.basis_conversion(), basis.stp_dict() Suggestion 4: conductivity_to_PSU has references, references for methods/model, if for code/checks it goes in source code. Suggestion 5: numpy.where is great, but in the provided example existing columns values need to be left alone so this will require a if/else. numpy.where will coerce the other values (y) to the dtype which is problematic for nan. There are fixes, but simple is better than complex.

jbousquin commented 2 weeks ago

pyOpenSci Review complete and branch merged