CADWRDeltaModeling / dms_datastore

Data download and management tools for continuous data for Pandas. See documentation https://cadwrdeltamodeling.github.io/dms_datastore/
https://cadwrdeltamodeling.github.io/dms_datastore/
MIT License
1 stars 0 forks source link

reformat failure #48

Closed dwr-psandhu closed 3 months ago

dwr-psandhu commented 4 months ago
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Scripts\usgs_multi-script.py", line 9, in <module>
    sys.exit(main())
             ^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 268, in main
    process_multivariate_usgs(fpath=fpath,pat=pat,rescan=True)
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 149, in process_multivariate_usgs
    df = usgs_multivariate(pat,'usgs_subloc_meta_new.csv')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 102, in usgs_multivariate
    series = usgs_scan_series(fname)  # Extract list of series in file
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 69, in usgs_scan_series
    raise ValueError(f"Time series description section not found in file {fname}")
ValueError: Time series description section not found in file formatted\usgs_srv_11455420_turbidity_2011.csv

The header on the file above looks like this

# format: dwr-dms-1.0
# agency_id: 11455420
# agency_ts_id:
# - '16197'
# - '16198'
# - '300449'
# crs_note: Reported lat-lon are agency provided. Projected coordinates may have been
#   revised based on additional information.
# date_formatted: 2024-02-27 03:31:27
# latitude: 38.14789999
# longitude: -121.6898619
# original_header: "---------------------------------- WARNING ----------------------------------------\n\
#   Some of the data that you have obtained from this U.S. Geological Survey database\
#   \ may not\nhave received Director's approval.  Any such data values are qualified\
#   \ as provisional and\nare subject to revision.  Provisional data are released on\
#   \ the condition that neither the\nUSGS nor the United States Government may be held\
#   \ liable for any damages resulting from its use.\n Go to http://help.waterdata.usgs.gov/policies/provisional-data-statement\
#   \ for more information.\n\nAutomated-retrieval info: http://help.waterdata.usgs.gov/faq/automated-retrievals\n\
#   \nContact:   gs-w_support_nwisweb@usgs.gov\nretrieved: 2024-02-27 03:59:24 -05:00\t\
#   (nadww01)\n\nData for the following 1 site(s) are contained in this file\n   USGS\
#   \ 11455420 SACRAMENTO R A RIO VISTA CA\n-----------------------------------------------------------------------------------\n\
#   \nTS_ID - An internal number representing a time series.\n\nData provided for site\
#   \ 11455420\n   TS_ID       Parameter Description\n   16197       63680     Turbidity,\
#   \ water, unfiltered, monochrome near infra-red LED light, 780-900 nm, detection\
#   \ angle 90 +-2.5 degrees, formazin nephelometric units (FNU), MEDIAN TS087: YSI\
#   \ 6136\n   16198       63680     Turbidity, water, unfiltered, monochrome near infra-red\
#   \ LED light, 780-900 nm, detection angle 90 +-2.5 degrees, formazin nephelometric\
#   \ units (FNU), [TS213: YSI EXO]\n\nData-value qualification codes included in this\
#   \ output:\n    A  Approved for publication -- Processing and review completed.\n\
#   \    R  Records for these data have been revised.\n"
# param: turbidity
# projection_authority_id: epsg:26910
# projection_x_coordinate: 614797.6
# projection_y_coordinate: 4223035.8
# source: usgs
# station_id: srv
# station_name: Sacramento R at Rio Vista
# subloc_comment: value averages unpublished sublocations
# sublocation: default
# unit: FNU
# 
dwr-psandhu commented 4 months ago

Similar failure today but a different file

Traceback (most recent call last):
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Scripts\usgs_multi-script.py", line 9, in <module>
    sys.exit(main())
             ^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 268, in main
    process_multivariate_usgs(fpath=fpath,pat=pat,rescan=True)
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 149, in process_multivariate_usgs
    df = usgs_multivariate(pat,'usgs_subloc_meta_new.csv')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 102, in usgs_multivariate
    series = usgs_scan_series(fname)  # Extract list of series in file
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 69, in usgs_scan_series
    raise ValueError(f"Time series description section not found in file {fname}")
ValueError: Time series description section not found in file formatted\usgs_cm62_11455142_ec_2014.csv
dwr-psandhu commented 4 months ago

@water-e The parsing fails because the scan is not robust. It uses a regular expression but doesn't account for the fact that the original header can have \n line endings and so patterns it is looking for take that into account.

Here's the sample header that fails with the above message :

# format: dwr-dms-1.0
# agency: usgs
# agency_id: 11455142
# agency_ts_id:
# - '15982'
# - '222827'
# crs_note: Reported lat-lon are agency provided. Projected coordinates may have been
#   revised based on additional information.
# date_formatted: 2024-03-05 19:26:47
# latitude: 38.34166667
# longitude: -121.6438889
# original_header: "---------------------------------- WARNING ----------------------------------------\n\
#   Some of the data that you have obtained from this U.S. Geological Survey database\
#   \ may not\nhave received Director's approval.  Any such data values are qualified\
#   \ as provisional and\nare subject to revision.  Provisional data are released on\
#   \ the condition that neither the\nUSGS nor the United States Government may be held\
#   \ liable for any damages resulting from its use.\n Go to http://help.waterdata.usgs.gov/policies/provisional-data-statement\
#   \ for more information.\n\nAutomated-retrieval info: http://help.waterdata.usgs.gov/faq/automated-retrievals\n\
#   \nContact:   gs-w_support_nwisweb@usgs.gov\nretrieved: 2024-03-05 21:06:14 -05:00\t\
#   (nadww01)\n\nData for the following 1 site(s) are contained in this file\n   USGS\
#   \ 11455142 SACRAMENTO R DEEP WATER SHIP CHANNEL NR COURTLAND\n-----------------------------------------------------------------------------------\n\
#   \nTS_ID - An internal number representing a time series.\n\nData provided for site\
#   \ 11455142\n   TS_ID       Parameter Description\n   15982       00095     Specific\
#   \ conductance, water, unfiltered, microsiemens per centimeter at 25 degrees Celsius,\
#   \ BGC PROJECT, [BGC PROJECT]\n   222827      00095     Specific conductance, water,\
#   \ unfiltered, microsiemens per centimeter at 25 degrees Celsius, DWS-BOR, [HYDRO\
#   \ PROJECT]\n\nData-value qualification codes included in this output:\n    A  Approved\
#   \ for publication -- Processing and review completed.\n"
# param: ec
# projection_authority_id: epsg:26910
# projection_x_coordinate: 618511.0
# projection_y_coordinate: 4244595.0
# source: usgs
# station_id: cm62
# station_name: Sacramento River Deep Water Ship Channel Near Courtland
# subloc_comment: value averages unpublished sublocations
# sublocation: default
# unit:
water-e commented 3 months ago

I'll create a test using that file if you can fish the offending file out for me. I can't reproduce it on my system – it doesn't fail


From: Nicky Sandhu @.> Sent: Wednesday, March 6, 2024 10:23 AM To: CADWRDeltaModeling/dms_datastore @.> Cc: Ateljevich, @. @.>; Mention @.***> Subject: Re: [CADWRDeltaModeling/dms_datastore] reformat failure (Issue #48)

@water-ehttps://github.com/water-e The parsing fails because the scan is not robust. It uses a regular expression but doesn't account for the fact that the original header can have \n line endings and so patterns it is looking for take that into account.

Here's the sample header that fails with the above message :

format: dwr-dms-1.0

agency: usgs

agency_id: 11455142

agency_ts_id:

- '15982'

- '222827'

crs_note: Reported lat-lon are agency provided. Projected coordinates may have been

revised based on additional information.

date_formatted: 2024-03-05 19:26:47

latitude: 38.34166667

longitude: -121.6438889

original_header: "---------------------------------- WARNING ----------------------------------------\n\

Some of the data that you have obtained from this U.S. Geological Survey database\

\ may not\nhave received Director's approval. Any such data values are qualified\

\ as provisional and\nare subject to revision. Provisional data are released on\

\ the condition that neither the\nUSGS nor the United States Government may be held\

\ liable for any damages resulting from its use.\n Go to http://help.waterdata.usgs.gov/policies/provisional-data-statement\

\ for more information.\n\nAutomated-retrieval info: http://help.waterdata.usgs.gov/faq/automated-retrievals\n\

\nContact: @.***\nretrieved: 2024-03-05 21:06:14 -05:00\t\

(nadww01)\n\nData for the following 1 site(s) are contained in this file\n USGS\

\ 11455142 SACRAMENTO R DEEP WATER SHIP CHANNEL NR COURTLAND\n-----------------------------------------------------------------------------------\n\

\nTS_ID - An internal number representing a time series.\n\nData provided for site\

\ 11455142\n TS_ID Parameter Description\n 15982 00095 Specific\

\ conductance, water, unfiltered, microsiemens per centimeter at 25 degrees Celsius,\

\ BGC PROJECT, [BGC PROJECT]\n 222827 00095 Specific conductance, water,\

\ unfiltered, microsiemens per centimeter at 25 degrees Celsius, DWS-BOR, [HYDRO\

\ PROJECT]\n\nData-value qualification codes included in this output:\n A Approved\

\ for publication -- Processing and review completed.\n"

param: ec

projection_authority_id: epsg:26910

projection_x_coordinate: 618511.0

projection_y_coordinate: 4244595.0

source: usgs

station_id: cm62

station_name: Sacramento River Deep Water Ship Channel Near Courtland

subloc_comment: value averages unpublished sublocations

sublocation: default

unit:

— Reply to this email directly, view it on GitHubhttps://github.com/CADWRDeltaModeling/dms_datastore/issues/48#issuecomment-1981523006, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2AJC6BVIMLIQSEXNE7C53YW5NKZAVCNFSM6AAAAABD5EWD5KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBRGUZDGMBQGY. You are receiving this because you were mentioned.Message ID: @.***>

dwr-psandhu commented 3 months ago

Its in the stack trace above. Here's the full path Y:\jenkins_repo_staging\continuous\formatted\usgs_cm62_11455142_ec_2014.csv

dwr-psandhu commented 3 months ago

It ran fine last night. Closing issue for now will reopen if it reoccurs.