reading the BRO data from XML files

dbrakenhoff commented 1 year ago

Separate the reading of XML files from the API request so users can read manually downloaded XML files.

Discussed in https://github.com/ArtesiaWater/hydropandas/discussions/104

^{Originally posted by **rt84ro** March 2, 2023} Hi every one, I have downloaded some wells from the BROloket website but their format is .xml. I want to read them using the HydroPandas but actually I do not know how I should open these files. could you please let me know how to read them?

HMEUW commented 1 year ago

I would like to implement this function. My starting point is to add an elif-statement to from_bro, like code below. And a new function get_obs_list_from_file. Do you agree?

if bro_id is None and (extent is not None):
            obs_list = get_obs_list_from_extent(
                extent,
                [..]
            )
            meta = {}
elif bro_id is not None:
            obs_list, meta = get_obs_list_from_gmn(
                [..]
            )
            name = meta.pop("name")
elif (extent is None) and (bro_id is None) and (fn is not None):
            obs_list = get_obs_list_from_file(
                fn,
            )
            meta = {}
else:
            raise ValueError("specify bro_id or extent")

dbrakenhoff commented 1 year ago

Yes, seems logical to me. And in io_bro the code to parse the XML file would then be separated and called in each of the get_obs_list_from_* methods?

Thanks for picking this up!

HMEUW commented 1 year ago

Yesterday I started with this issue. I think we need two extra variables to implement this issue: origin and local_path .

My personal issue is added in the last line of the table.

User case	Comment	value `origin`	value `local_path`
Get data from Broloket.nl, within extent	Like current behaviour of function	internet	None
Get data from Broloket.nl, within extent, and save Bro XMLs for future use	Like above, but save downloaded data	internet	path where zip will be created
Use downloaded data from Broloket.nl, read all data	User has downloaded data available, via manual download or case in row above	local	path to zip
Local GMW files that have to be uploaded to bronhouderportaal-bro, read all data	To check data in these files before submission	local_bronhouder	path to zip

I added bronhouder to origin in the last use case, because these files have some minor changes compared to broloket.nl-files. E.g. the BROid is not available, because the file has not yet been sent to broloket.nl. The easiest way is to use a separate value.

What do you think about this approach @dbrakenhoff?

dbrakenhoff commented 1 year ago

Thanks for the clear overview!

I think we should create two routes for getting data from the BRO, one API (internet) route and one local file route. So both ObsCollection and GroundwaterObs should get a from_bro() method for downloading data from the internet through the BRO API. I like your suggestion for storing this downloaded data, so these methods should accept some sort of directory or filename for storing the downloaded data.

Then I would suggest a separate route for reading the local files, from_bro_file/dir/local(), not sure what the name should be yet, but something along those lines. These methods accept a a directory/zip (in the case of ObsCollection), or a filename (in the case of Observation).

I think separating these two is probably clearest and makes the code less complicated.

Then bro.py should contain something like the following functions:

read_bro_xml() --> reads single BRO XML file
read_bro_dir() --> reads directory or zipfile with one or multiple XML files, basically calls the read_bro_xml() method in a loop.
replace the XML parsing in the current API functions with the read functions listed above.

@HMEUW, let me know what you think about this?

PS. I realize we're probably not very consistent across data sources in how we expose local vs API routes, but we shold probably address that in a separate issue. At this moment I'd vote for separating the two routes for each data source.

HMEUW commented 1 year ago

I just completed first version to read XMLs for newly construced wells. These are submitted to https://www.bronhouderportaal-bro.nl. @OnnoEbbens please have a look for this code. The full_meta function is not yet working. Code is in the 'add-bronh'-branch.

HMEUW commented 1 year ago

Started reading local BROloket files in the branch import-broloket-from file. Cannot make a direct link here.

HMEUW commented 1 year ago

Just comitted my work. I have holiday after this week. I cannot work on it before, and expect I have the two after my holiday no time either.

If someone else want to pickup in July or August. It is okay.

ArtesiaWater / hydropandas

reading the BRO data from XML files #105

Discussed in https://github.com/ArtesiaWater/hydropandas/discussions/104