Closed dwr-psandhu closed 3 months ago
Similar failure today but a different file
Traceback (most recent call last):
File "d:\ProgramData\miniconda3\envs\dms_datastore\Scripts\usgs_multi-script.py", line 9, in <module>
sys.exit(main())
^^^^^^
File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 268, in main
process_multivariate_usgs(fpath=fpath,pat=pat,rescan=True)
File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 149, in process_multivariate_usgs
df = usgs_multivariate(pat,'usgs_subloc_meta_new.csv')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 102, in usgs_multivariate
series = usgs_scan_series(fname) # Extract list of series in file
^^^^^^^^^^^^^^^^^^^^^^^
File "d:\ProgramData\miniconda3\envs\dms_datastore\Lib\site-packages\dms_datastore\usgs_multi.py", line 69, in usgs_scan_series
raise ValueError(f"Time series description section not found in file {fname}")
ValueError: Time series description section not found in file formatted\usgs_cm62_11455142_ec_2014.csv
@water-e The parsing fails because the scan is not robust. It uses a regular expression but doesn't account for the fact that the original header can have \n line endings and so patterns it is looking for take that into account.
Here's the sample header that fails with the above message :
# format: dwr-dms-1.0
# agency: usgs
# agency_id: 11455142
# agency_ts_id:
# - '15982'
# - '222827'
# crs_note: Reported lat-lon are agency provided. Projected coordinates may have been
# revised based on additional information.
# date_formatted: 2024-03-05 19:26:47
# latitude: 38.34166667
# longitude: -121.6438889
# original_header: "---------------------------------- WARNING ----------------------------------------\n\
# Some of the data that you have obtained from this U.S. Geological Survey database\
# \ may not\nhave received Director's approval. Any such data values are qualified\
# \ as provisional and\nare subject to revision. Provisional data are released on\
# \ the condition that neither the\nUSGS nor the United States Government may be held\
# \ liable for any damages resulting from its use.\n Go to http://help.waterdata.usgs.gov/policies/provisional-data-statement\
# \ for more information.\n\nAutomated-retrieval info: http://help.waterdata.usgs.gov/faq/automated-retrievals\n\
# \nContact: gs-w_support_nwisweb@usgs.gov\nretrieved: 2024-03-05 21:06:14 -05:00\t\
# (nadww01)\n\nData for the following 1 site(s) are contained in this file\n USGS\
# \ 11455142 SACRAMENTO R DEEP WATER SHIP CHANNEL NR COURTLAND\n-----------------------------------------------------------------------------------\n\
# \nTS_ID - An internal number representing a time series.\n\nData provided for site\
# \ 11455142\n TS_ID Parameter Description\n 15982 00095 Specific\
# \ conductance, water, unfiltered, microsiemens per centimeter at 25 degrees Celsius,\
# \ BGC PROJECT, [BGC PROJECT]\n 222827 00095 Specific conductance, water,\
# \ unfiltered, microsiemens per centimeter at 25 degrees Celsius, DWS-BOR, [HYDRO\
# \ PROJECT]\n\nData-value qualification codes included in this output:\n A Approved\
# \ for publication -- Processing and review completed.\n"
# param: ec
# projection_authority_id: epsg:26910
# projection_x_coordinate: 618511.0
# projection_y_coordinate: 4244595.0
# source: usgs
# station_id: cm62
# station_name: Sacramento River Deep Water Ship Channel Near Courtland
# subloc_comment: value averages unpublished sublocations
# sublocation: default
# unit:
I'll create a test using that file if you can fish the offending file out for me. I can't reproduce it on my system – it doesn't fail
From: Nicky Sandhu @.> Sent: Wednesday, March 6, 2024 10:23 AM To: CADWRDeltaModeling/dms_datastore @.> Cc: Ateljevich, @. @.>; Mention @.***> Subject: Re: [CADWRDeltaModeling/dms_datastore] reformat failure (Issue #48)
@water-ehttps://github.com/water-e The parsing fails because the scan is not robust. It uses a regular expression but doesn't account for the fact that the original header can have \n line endings and so patterns it is looking for take that into account.
Here's the sample header that fails with the above message :
— Reply to this email directly, view it on GitHubhttps://github.com/CADWRDeltaModeling/dms_datastore/issues/48#issuecomment-1981523006, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2AJC6BVIMLIQSEXNE7C53YW5NKZAVCNFSM6AAAAABD5EWD5KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBRGUZDGMBQGY. You are receiving this because you were mentioned.Message ID: @.***>
Its in the stack trace above. Here's the full path Y:\jenkins_repo_staging\continuous\formatted\usgs_cm62_11455142_ec_2014.csv
It ran fine last night. Closing issue for now will reopen if it reoccurs.
The header on the file above looks like this