USACE / cumulus

Cumulus project issue tracking and project planning
MIT License
3 stars 2 forks source link

[Data Request] Missing Gaps in NDGD Precip 1hr #191

Closed adamscarberry closed 2 years ago

adamscarberry commented 2 years ago

Hello,

I am looking to download hourly grided precipitation data for the IL River Basin from Oct 2019 through Oct 2020. The only dataset that spans this timeframe is the NDGD Precip 1hr. However, there are many time steps with missing data.

I wanted to confirm that this is correct and that the gaps are present in the dataset. I was also wondering if you had any recommendations for alternative datasets for the missing time steps. One possibility is NEXRAD, but this would require a lot of data processing and downloading (> 1 TB of data, 90,000+ images). I also worry that NEXRAD is more for real-time situational awareness than tracking hourly precipitation totals. Another options is URMA (https://data.eol.ucar.edu/dataset/21.093), but the resolution is 4 km.

Thank you in advance for your assistance, LRB

adamscarberry commented 2 years ago

I have an airflow DAG setup to copy raw files from S3 to appropriate destination with a task to notify the Cumulus API. Jan 2020 raw files have been loaded as a test in develop, but the Airflow DAG cannot read from the bucket/key location due to: botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

Raw Filenames can't be predicted at the minutes/seconds varies: 2019.10.31.235009--NCEP-rtma_precip--RT.23.ds.precipa.bin Reading from a certain directory inside S3 with a prefix search was the method I settled on.

adamscarberry commented 2 years ago

Going to try to script file renaming prior to Airflow. This will remove the unpredictable minutes/seconds. Hopefully airflow can reach each file in the acquirables/tbd_dir to copy original raw file and rename to appropriate name while notifying API.

adamscarberry commented 2 years ago

CPC Import to Stable Corps.Cloud

Year-Month Record Count Before Record Count After Delta
2019-10 699 744 +43
2019-11 611 720 +109
2019-12 345 709 +364
2020-01 658 743 +85
2020-02 0 686 +686
2020-03 0 739 +739
2020-04 0 697 +697
2020-05 519 744 +225
2020-06 633 720 +87
2020-07 529 742 +213
2020-08 265 739 +474
2020-09 281 718 +437
2020-10 249 726 +477
2020-11 0 707 +707
2020-12 0 736 +736
2021-01 0 736 +736
2021-02 0 664 +664
2021-03 405 736 +331
2021-04 601 716 +115
2021-05 564 744 +180
2021-06 454 704 +250

BEFORE:

image

AFTER:

image
adamscarberry commented 2 years ago

Airflow states for CPC processed files: Dec-2019 - Jun-2021

Screen Shot 2022-03-25 at 5 18 37 PM

adamscarberry commented 2 years ago

closing for now. may reopen to resume back-filling more data or may open a new issue.