Open JessyBarrette opened 2 months ago
So proposed actions / checks, to run when merging survey-date-branch into main:
YYYY-MM-DD_*
(e.g., 2024-09-24_survey_date)YYYY-MM-DD_survey_raw.jpeg
and YYYY-MM-DD_survey_final.csv
file.Analysis Date
column is in the correct format (YYYY-MM-DD). Tagging the Science Team to check whether there's any checks you would like to see implemented, or prioritize the checks listed above: @jdelbel @hakaidrew @CarrieWeekes @naomiboon7
A minimal example to check that a date is in the correct format and survey station exists in stations.csv The global variables are poor coding practice.
import pandas as pd
import json
import unittest
# read in csv as pandas dataframes
survey_final_df = pd.read_csv(
'./data/2024-09-16_example_dataset/2024-09-16_survey_final.csv', sep=',')
stations_df = pd.read_csv('./stations.csv', sep=',')
class TestSurveyFinal(unittest.TestCase):
def test_survey_date(self):
# iterate over the rows and assert that the survey date is in the correct format. if not, throw exception.
for index, row in survey_final_df.iterrows():
self.assertRegex(
row['Survey Date'],
r'^\\d{4}\\-(0?[1-9]|1[012])\\-(0?[1-9]|[12][0-9]|3[01])$',
f'Survey Date of {row['Survey Date']
} does not match YYYY-MM-DD format'
)
def test_survey_stations(self):
# find all survey stations that do not have a match in stations.csv
joined_df = survey_final_df.merge(stations_df, 'left',
left_on='Station', right_on='station_id')
missing_stations = list(
set(
joined_df[joined_df['station_id'].isnull()].Station
)
)
count = len(missing_stations)
self.assertEqual(count, 0, f"Survey stations missing from stations.csv: {', '.join(missing_stations)}")
unittest.main()
In an ideal world it would be good to implement an action to this repo that can be run on push to main which does the following tests:
yyyy-mm-dd.*
Create an issue for each failed checks.