Closed vchendrix closed 2 years ago
@dschristianson There is one issue with this DRAFT pull request. If I drop the NaNs that means metacata.records
for reach variable may be in accurate. I chose to only drop NaNs for aggregation_duration < DAY. However, my guess it that I should update metadata.records
for each variable after I drop the NaNs. Do you think that is the right approach. If so, should I just drop NaNs for all cases?
@dschristianson There is one issue with this DRAFT pull request. If I drop the NaNs that means
metacata.records
for reach variable may be in accurate. I chose to only drop NaNs for aggregation_duration < DAY. However, my guess it that I should updatemetadata.records
for each variable after I drop the NaNs. Do you think that is the right approach. If so, should I just drop NaNs for all cases?
@vchendrix: It should be fine. The records are counted from the return so it will reflect the actual data values.
@dschristianson There is one issue with this DRAFT pull request. If I drop the NaNs that means
metacata.records
for reach variable may be in accurate. I chose to only drop NaNs for aggregation_duration < DAY. However, my guess it that I should updatemetadata.records
for each variable after I drop the NaNs. Do you think that is the right approach. If so, should I just drop NaNs for all cases?@vchendrix: It should be fine. The records are counted from the return so it will reflect the actual data values.
@dschristianson my main concern is that the user of the HDF file will not be able to account for the discrepency if they inspect it.
@vchendrix: it is still the same number of non-nan values. This is what the data users care about. Same situation occurs with DAY as well.
Thanks for getting this out so fast, @vchendrix!
To document our conversation yesterday. We currently need to support temporal_frequency / aggregation_duration: MINUTE (which is mapped to instantaneous in our WF-SFA use case), HOUR, DAY, MONTH. We'll need more work to fully think thru supporting aggregation_duration = NONE. Please add anything I missed!
Few items for this PR:
- Documentation in basin3d/synthesis.py: 480 needs to be updated (sorry, GitHub won't let me make the suggested changes in that part of the file).
- We hardcode DAY for TimeseriesMeasurementMVPObservation class in the basin3d/core/synthesis.py. I'm thinking this needs to be changed to allow other aggregation_durations?
- https://github.com/BASIN-3D/basin3d/blob/b5a78834d311aac9649652417988b2b20839c4cb/basin3d/core/synthesis.py#L432
- Specific question as noted inline.
@dschristianson Thanks for capturing out conversation and reviewing this draft. I will follow up on your points above
Default day is implemented in the pydantic schema +++++++++++++++++++++++++++++++++ Val Hendrix @.*** Lawrence Berkeley National Lab
Mail Stop: 50B-2239 Room: 50B-2258E Phone: (510) 495-2905 Pronouns: she/hers +++++++++++++++++++++++++++++++++
On Fri, May 27, 2022 at 9:32 AM D.S. Christianson @.***> wrote:
@.**** requested changes on this pull request.
Thx for quick turnaround, @vchendrix https://github.com/vchendrix. One more item.
In basin3d/synthesis.py https://github.com/BASIN-3D/basin3d/pull/132#discussion_r883770063:
@@ -477,7 +477,7 @@ def get_timeseries_data(synthesizer: DataSynthesizer, location_lat_long: bool =
- start_date Optional parameters for MeasurementTimeseriesTVPObservation:
- end_date
- aggregation_duration = resolution = DAY (only DAY is currently supported)
- aggregation_duration = resolution in (SECOND, MINUTE, HOUR, DAY, MONTH) Default: DAY
To implement default: DAY which I think we should do, I suggest the following be made around new line 490 (sorry, again GitHub won't let me directly suggest).
- Since temporal_resolution is optional, if it is None (presumably by default), then make the kwargs['aggregation_duration'] = TimeFrequencyEnum.DAY
- I suggest only permitting SECOND, MINUTE, HOUR, DAY, MONTH for temporal_resolution by adding a check for these values. If check fails, fail to synthesize.
Otherwise, since the hardcoding of DAY in MeasurementTimeseriesTVPObservation in basin3d/core/synthesis.py was removed, we would be supporting all temporal frequencies / duration_aggregations and not explicitly handling the NONE case.
— Reply to this email directly, view it on GitHub https://github.com/BASIN-3D/basin3d/pull/132#pullrequestreview-987745842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL4M4MQ3OXETKOBB2AI2YLVMD2KDANCNFSM5XB6Y4UA . You are receiving this because you were mentioned.Message ID: @.***>
I think we leave since we are treating it as the same right now in get_timeseries_data. We can address it later. +++++++++++++++++++++++++++++++++ Val Hendrix @.*** Lawrence Berkeley National Lab
Mail Stop: 50B-2239 Room: 50B-2258E Phone: (510) 495-2905 Pronouns: she/hers +++++++++++++++++++++++++++++++++
On Fri, May 27, 2022 at 9:54 AM D.S. Christianson @.***> wrote:
@.**** commented on this pull request.
In basin3d/synthesis.py https://github.com/BASIN-3D/basin3d/pull/132#discussion_r883785651:
@@ -477,7 +477,7 @@ def get_timeseries_data(synthesizer: DataSynthesizer, location_lat_long: bool =
- start_date Optional parameters for MeasurementTimeseriesTVPObservation:
- end_date
- aggregation_duration = resolution = DAY (only DAY is currently supported)
- aggregation_duration = resolution in (SECOND, MINUTE, HOUR, DAY, MONTH) Default: DAY
Ah, awesome! Thanks for pointing that out. Any thoughts on restricting the temporal_resolution values?
— Reply to this email directly, view it on GitHub https://github.com/BASIN-3D/basin3d/pull/132#discussion_r883785651, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL4M4NILD4ABGZPVVIHRO3VMD43DANCNFSM5XB6Y4UA . You are receiving this because you were mentioned.Message ID: @.***>
I think we leave since we are treating it as the same right now in get_timeseries_data. We can address it later. +++++++++++++++++++++++++++++++++ Val Hendrix @.*** Lawrence Berkeley National Lab Mail Stop: 50B-2239 Room: 50B-2258E Phone: (510) 495-2905 Pronouns: she/hers +++++++++++++++++++++++++++++++++ …
Apologies that I probably wasn't clear. My suggestion is to restrict temporal_resolution
in get_timeseries_data
to only the resolutions that we document supporting SECOND, MINUTE, HOUR, DAY, MONTH
.
So something like the following around new line 490:
if temporal_resolution not in [TemporalFrequencyEnum.SECOND, TemporalFrequencyEnum.MINUTE, TemporalFrequencyEnum.HOUR, TemporalFrequencyEnum.DAY, TemporalFrequencyEnum.DAY]: raise SynthesisException(f'Specified temporal_resolution {temporal_resolution} is not supported.')
I think we leave since we are treating it as the same right now in get_timeseriesdata. We can address it later. +++++++++++++++++++++++++++++++++ Val Hendrix @_.*** Lawrence Berkeley National Lab Mail Stop: 50B-2239 Room: 50B-2258E Phone: (510) 495-2905 Pronouns: she/hers +++++++++++++++++++++++++++++++++ …
Apologies that I probably wasn't clear. My suggestion is to restrict
temporal_resolution
inget_timeseries_data
to only the resolutions that we document supportingSECOND, MINUTE, HOUR, DAY, MONTH
.So something like the following around new line 490:
if temporal_resolution not in [TemporalFrequencyEnum.SECOND, TemporalFrequencyEnum.MINUTE, TemporalFrequencyEnum.HOUR, TemporalFrequencyEnum.DAY, TemporalFrequencyEnum.DAY]: raise SynthesisException(f'Specified temporal_resolution {temporal_resolution} is not supported.')
The pydantic schema will catch this
My understanding is that we're not yet really supportingYEAR
and we are not yet supporting NONE
.
Thus the idea is to explicitly not support these in get_timeseries_data
.
My understanding is that we're not yet really supporting
YEAR
and we are not yet supportingNONE
. Thus the idea is to explicitly not support these inget_timeseries_data
.
I am not aware of any references to 'NONE' in this PR. The TimeFrequencyEnum does not reference it either
As for YEAR, there is not reason to not support YEAR. Some plugins may implement it. I think it is a question for the refactor as to how we communicate to the user which query params are supported by a data source.
Currently get_timeseries_data can only handle dates and not timestamps with utc offsets. In order the support instananeous values we need to remove the timezone information from the timeseries timestamps and assume local time for the monitoring feature. The git diff below shows a proof of concept for this implementation.
Closes #131
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Test Configuration
PR Self Evaluation
strikethrough things that don’t make sense for your PR