GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
635 stars 100 forks source link

O+M 2024-08-26 #4864

Closed FuhuXia closed 2 months ago

FuhuXia commented 2 months ago

As part of day-to-day operation of Data.gov, there are many Operation and Maintenance (O&M) responsibilities. Instead of having the entire team watching notifications and risking some notifications slipping through the cracks, we have created an O&M Triage role. One person on the team is assigned the Triage role which rotates each sprint. This is not meant to be a 24/7 responsibility, only East Coast business hours. If you are unavailable, please note when you will be unavailable in Slack and ask for someone to take on the role for that time.

Check the O&M Rotation Schedule for future planning.

Acceptance criteria

You are responsible for all O&M responsibilities this week. We've highlighted a few so they're not forgotten. You can copy each checklist into your daily report.

Daily Checklist

Weekly Checklist

Monthly Checklist

ad-hoc checklist

Reference

FuhuXia commented 2 months ago

catalog-gather and catalog-fetch in staging are stuck, no deployment can go thru.

catalog-fetch          started           web:0/0, web:0/0   
catalog-gather         started           web:0/0, web:0/0   
FuhuXia commented 2 months ago

Deleted staging catalog-gather and catalog-fetch. Deployment is fine now.

FuhuXia commented 2 months ago

After deployment, catalog-gather and catalog-fetch are back to stuck state web:0/0, web:0/0.

FuhuXia commented 2 months ago

setting instance 1 (instead of 0) for catalog-gather and catalog-fetch is the workaround. https://github.com/GSA/catalog.data.gov/pull/1439

FuhuXia commented 2 months ago

set IMLS to manual schedule and marked it as a broken source, since there is no response to our request to unblock harvesting traffic. https://github.com/GSA/data.gov/issues/4799#issuecomment-2187000549

FuhuXia commented 2 months ago

monthly harvest ioos with 30k+ records just refreshed all its timestamps to 2024-08-25 in source https://data.noaa.gov/waf/NOAA/ioos/iso/xml/ .

Sent 33790 objects to the fetch queue

UPDATE: It finished after 3+ days. The scary part is that it appears the timestamp changes are mpstly legit due to file content change, not simple refreshing.

image

Some (1127 to be exact, 3.5% ) are caught as simple file refreshing, for exchange this one https://data.noaa.gov/waf/NOAA/ioos/iso/xml/world_equator.xml, timestamped 2024-08-25 but <gmd:dateStamp> 2021-03-30

image

rshewitt commented 2 months ago

apparently i don't have edit access to the audit log. ooooooooooof. can someone grant me that?

FuhuXia commented 2 months ago

apparently i don't have edit access to the audit log. ooooooooooof. can someone grant me that?

Done.