Closed stijnvanhoey closed 6 years ago
We can not convert the data of one screen (filter) to only one dataframe. If we have a list of screen I think we actually have three dataframes: one with the screen properties (id, x, y, ...) a second one with water level observations (date + level + some other fields) and a third frame with groundwater quality data.
I agree using pandas is the way forward.
Indeed, I do agree that we can not put all the information in one dataframe, but will have to get a class with the different dataframes as attribute. Cfr. the setup branch on my fork, https://github.com/stijnvanhoey/pydov/blob/setup/pydov/dovseries.py which can be interpreted as a first draft implementation of this concept. I have to check the new data format and adapt the code next week.
Ok - I was not aware of that branch. Perhaps one comment: we better call the class DovGroundwater/DovGrondwater, as we will be adding similar objects for other objects.
Closing here, check #18 for further discussion
As a functionality, the user would could like this (naming should be improved to better fit the naming in the groundwater domain):
(in words: download my list of wells, filter that specific period and write everything into a csv-file)
Basically, there are 3 parts in this setup:
download
, i.e. extraction part: downloading data based on a list of stations; this part could be extended towards more powerfuldownload_****
function, e.g.download_from_boundingbox
,download_from_aquifer()
,... These extension functions of the regulardownload
will always require soma additional service calls, but will end up having a list of stations and use thedownload
functionsubset_*
, i.e. filter part: this should provide some straightforward functions to filter the downloaded data set. When using pandas DataFrames as the basic data type to store the data (see further), a lot of options will be available.to_***
, i.e. conversion part: The data is stored or exported to a new file-format that could be useful for the user.to_csv
/to_excel
are exampled that are already available, but the advantages of this package would be if there are more domain-specific export funtionalities, e.g.to_modflow()
,to_menyanthes()
,to_swap()
As we're dealing with time series, the usage of Pandas DataFrames as used datatype, provides a lot of built-in options. When needed, we make a new class inhereted on the pd.DataFrame to handle some additional metadata. Multiple stations can be solved by having a Multi-index as column headers. With the row labels as a DateTimeIndex, we have all the data handling options like resampling (daily/monthly/... mean values) and slicng data from Pandas available.
The fact that we will have the XML-format as such (always a complete time serie) as the stable source for data, I would propose to have an
xml_to_df
conversion function that converts the XML to a Pandas DataFrame as a basic function in direct relation with the other basic functionalitydownload
. These two functions (xml_to_df
anddownload
) could be the first milestone to implement. Than, more advanced download functions and more advanced export functions can be created ont op of this.