Closed adamscarberry closed 2 years ago
I have an airflow DAG setup to copy raw files from S3 to appropriate destination with a task to notify the Cumulus API. Jan 2020 raw files have been loaded as a test in develop, but the Airflow DAG cannot read from the bucket/key location due to:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
Raw Filenames can't be predicted at the minutes/seconds varies: 2019.10.31.235009--NCEP-rtma_precip--RT.23.ds.precipa.bin
Reading from a certain directory inside S3 with a prefix search was the method I settled on.
Going to try to script file renaming prior to Airflow. This will remove the unpredictable minutes/seconds. Hopefully airflow can reach each file in the acquirables/tbd_dir to copy original raw file and rename to appropriate name while notifying API.
CPC Import to Stable Corps.Cloud
Year-Month | Record Count Before | Record Count After | Delta |
---|---|---|---|
2019-10 | 699 | 744 | +43 |
2019-11 | 611 | 720 | +109 |
2019-12 | 345 | 709 | +364 |
2020-01 | 658 | 743 | +85 |
2020-02 | 0 | 686 | +686 |
2020-03 | 0 | 739 | +739 |
2020-04 | 0 | 697 | +697 |
2020-05 | 519 | 744 | +225 |
2020-06 | 633 | 720 | +87 |
2020-07 | 529 | 742 | +213 |
2020-08 | 265 | 739 | +474 |
2020-09 | 281 | 718 | +437 |
2020-10 | 249 | 726 | +477 |
2020-11 | 0 | 707 | +707 |
2020-12 | 0 | 736 | +736 |
2021-01 | 0 | 736 | +736 |
2021-02 | 0 | 664 | +664 |
2021-03 | 405 | 736 | +331 |
2021-04 | 601 | 716 | +115 |
2021-05 | 564 | 744 | +180 |
2021-06 | 454 | 704 | +250 |
BEFORE:
AFTER:
Airflow states for CPC processed files: Dec-2019 - Jun-2021
closing for now. may reopen to resume back-filling more data or may open a new issue.
Hello,
I am looking to download hourly grided precipitation data for the IL River Basin from Oct 2019 through Oct 2020. The only dataset that spans this timeframe is the NDGD Precip 1hr. However, there are many time steps with missing data.
I wanted to confirm that this is correct and that the gaps are present in the dataset. I was also wondering if you had any recommendations for alternative datasets for the missing time steps. One possibility is NEXRAD, but this would require a lot of data processing and downloading (> 1 TB of data, 90,000+ images). I also worry that NEXRAD is more for real-time situational awareness than tracking hourly precipitation totals. Another options is URMA (https://data.eol.ucar.edu/dataset/21.093), but the resolution is 4 km.
Thank you in advance for your assistance, LRB