This repository contains the processing pipeline for IceFloeTracker.jl and ancillary scripts.
The Satellite Overpass Identification Tool is called to generate a list of satellite times for both Aqua and Terra in the area of interest. This program is written in Python and its dependencies are pulled from a Docker container at docker://brownccv/icefloetracker-fetchdata:main
.
Register an account with space-track.org to use SOIT.
To run SOIT manually :
nano .bash_profile
export HISTCONTROL=ignoreboth
to the bottom of your .bash_profile
export SPACEUSER=<firstname>_<lastname>@brown.edu
export SPACEPSWD=<password>
docker run --env SPACEUSER --env SPACEPSWD --mount type=bind,source=<your_desired_output_dir>,target=/tmp brownccv/icefloetracker-fetchdata:main \
python3 /usr/local/bin/pass_time_cylc.py --startdate <YYYY-MM-DD> --enddate <YYYY-MM-DD> --csvoutpath /tmp --centroid_lat <input_centroid_lat> --centroid_lon <input_centroid_lon> --SPACEUSER $SPACEUSER --SPACEPSWD $SPACEPSWD
source
, startdate
, enddate
, centroid_lat
, and centroid_lon
with your desired inputs/tmp
to bind the Docker container output path with your desired local pathNote: The pass_time_cylc.py
script in this project can be adapted to include additional satellites available in the space-track.org repository. If you have numpy
and skyfield
installed in a local conda
environment, you can run pass_time_cylc.py
from the directory where you installed the Ice Floe Tracker Pipeline:
python3 workflow/scripts/pass_time_cylc.py --startdate <YYYY-MM-DD> --enddate <YYYY-MM-DD> --csvoutpath <save_directory> --centroid_x <input_centroid_x> --centroid_y <input_centroid_y> --SPACEUSER $SPACEUSER --SPACEPSWD $SPACEPSWD
All the software dependencies to run fetchdata.sh
are found in the Docker container at docker://brownccv/icefloetracker-fetchdata:main
.
To fetch data independently of other tasks:
docker run --mount type=bind,source=<your_desired_output_dir>,target=/tmp \
brownccv/icefloetracker-fetchdata:main \
/usr/local/bin/fetchdata.sh -o /tmp -s <YYYY-MM-DD> -e <YYYY-MM-DD> -c <wgs84|epsg3413> -b <top_left_lat@top_left_lon@lower_right_lat@lower_right_lon|left_x@top_y@right_x@lower_y
source
, s
(startdate), e
(enddate), c
(crs), and b
(bounding box) with your inputso
(output) must remain as /tmp
to bind the Docker container output path with your desired local pathc
(crs) must be either wgs84 (lat/lon) or epsg3414 (polar stereographic)b
(bounding box) inputs must match the crsCylc is used to encode the entire pipeline from start to finish and relies on the command line scripts to automate the workflow. The config/cylc_hpc/flow.cylc
file should be suitable for runs on HPC systems. The default pipeline is built to run on Brown's Oscar HPC and each task is submitted as its own batch job. To run Cylc locally, the config/cylc_local/flow.cylc
file is used.
flow.cylc
file to iterate through parameter setsWe can use Jinja2 to populate a flow.cylc
file using a CSV file with input parameters. You will be passing the name of this file to flow_generator.py
. An example specification file is provided in the config folder, ./config/sample_site_locations.csv
. Each row in the CSV file defines a parameter set. Parameter sets consist of a name, a date range, and a geographic bounding box. These columns are required:
location
(string): name for parameter set. Must be unique.center_lat
(numeric): latitude of the centroid of each scene in decimal degrees, used for finding the satellite overpass timecenter_lon
(numeric): longitude of the centroid of each scene in decimal degrees, used for finding the satellite overpass timestartdate
(YYYY-MM-DD): first date to downloadenddate
(YYYY-MM-DD): end date for the date range (exclusive, i.e., the last date downloaded is the day before enddate
)
The bounding box can be specified either using latitude and longitude (crs=wgs84) or north polar stereographic (crs=epsg3413).
For wgs84 (lat/lon), use: top_left_lat
top_left_lon
lower_right_lat
lower_right_lon
For epsg3413 (polar stereographic), use:
left_x
right_x
lower_y
top_y
Note: bounding box format = top_left_x top_left_y bottom_right_x bottom_right_y (x = lat(wgs84) or easting(epsg3413), y = lon(wgs84) or northing(epsg3413))
Build a virtual environment and install Cylc
cd <your-project-path>/ice-floe-tracker-pipeline
conda env create -f ./config/ift-env.yaml
conda activate ift-env
Make sure you have registered for an account with space-track.org
and exported your SOIT credentials as an environment variable on Oscar as outlined in the SOIT integration section.
Prepare the runtime environment
Cylc will use software dependencies inside a Singularity container to fetch images and satellite times from external APIs.
[ ] It is a good idea to reset the Singularity cache dir to scratch
as specified here. Images take up a lot of space and scratch
gets cleaned regularly.
[ ] first populate the flow.cylc
file by running:
python workflow/scripts/flow_generator.py \
--csvfile "./config/<site_locations_file.csv>" \
--template "flow_template_hpc.j2" \
--template_dir "./config/cylc_hpc" \
--crs "<crs>" \
--minfloearea <value> \
--maxfloearea <value>
(replacing <site_locations_file.csv>
with the name of your CSV file.)
Run python workflow/scripts/flow_generator.py --help
for a list of options.
[ ] then, build the workflow, run it, and open the Terminal-based User Interface (TUI) to monitor the progress of each task.
cylc install -n <workflow-name> ./config/cylc_hpc
cylc validate <workflow-name>
cylc play <workflow-name>
cylc tui <workflow-name>
You can also get an image of the task dependency graph with cylc graph <workflow-name>
; you have to click the generated link to open it in VS Code.
cylc stop --now <workflow-name>
cylc clean <workflow-name>
Note: Error logs are available for each task:
cat ./cylc-run/<workflow-name>/<run#>/log/job/1/<task-name>/01/job.err
The entire cylc-run
workflow generated by Cylc is also symlinked to ~/ice-floe-tracker-pipeline/workflow/cylc-run/
.
Julia logging is also available at ~/ice-floe-tracker-pipeline/workflow/report/
Failed tasks with retry automatically. If all retrys fail, there are likely too many clouds in the study area for the given dates. Try using the NASA Worldview web app to find better dates with fewer clouds. For large bounding boxes, you may need to increase the memory or cpus-per-task flags in the cylc_hpc/flow.cylc
file for tasks that are failing.
Julia: When running locally, make sure you have at least Julia 1.9.0 installed with the correct architecture for your local machine. (https://julialang.org/downloads/)
Docker Desktop: Also make sure Docker Desktop client is running in the background to use the Cylc pipeline locally. (https://www.docker.com/products/docker-desktop/)
cd <your-project-path>/ice-floe-tracker-pipeline
conda env create -f ./config/ift-env.yaml
conda activate ift-env
Note: Depending on your existing Conda config, you may need to update your .condarc
file to: auto_activate_base: false
if you get errors running your first Cylc workflow.
Make sure you have registered for an account with space-track.org
and exported your SOIT credentials as an environment variable on your local computer as outlined in the SOIT integration section.
Install your workflow, run it, and monitor with the Terminal User Interface (TUI)
flow.cylc
file by running:
python workflow/scripts/flow_generator.py \
--csvfile "./config/<site_locations_file.csv>" \
--template "flow_template_local.j2" \
--template_dir "./config/cylc_local" \
--crs "<crs>" \
--minfloearea <value> \
--maxfloearea <value>
(replacing <site_locations_file.csv>
with the name of your CSV file.)
Run python workflow/scripts/flow_generator.py --help
for a list of options.
cylc install -n <your-workflow-name> ./config/cylc_local
cylc graph <workflow-name> #install graphviz locally
cylc validate <workflow-name>
cylc play <workflow-name>
cylc tui <workflow-name>
The Terminal-based User Interface provides a simple way to watch the status of each task called in the flow.cylc
workflow. Use arrow keys to investigate each task (see more here.
).
If you need to change parameters and re-run a workflow, first do:
cylc stop --now <workflow-name>
cylc clean <workflow-name>
cylc stop --now <workflow-name> && \
cylc clean <workflow-name> && \
cylc install -n <workflow-name> ./config/cylc_hpc && \
cylc validate <workflow-name> && \
cylc play <workflow-name> && \
cylc tui <workflow-name>
Note: Error logs are available for each task:
cat ~/cylc-run/<workflow-name>/<run#>/log/job/1/<task-name>/01/job.err
When working locally, we have found VSCode to be a good interface for development.
When working locally and using the Cylc local pipeline, double check that the Docker client is running and clean the Docker cache to make sure you are using the latest images.
[ ] delete any existing images from the Docker Dashboard
[ ] from a terminal, run:
docker rm $(docker ps -aq)
[ ] from a terminal, run:
docker image prune
When running the CLI locally, make sure you have at least Julia 1.9.0 installed with the correct architecture for your local machine. (https://julialang.org/downloads/)
[ ] cd <your-project-path>/ice-floe-tracker-pipeline
Open a Julia REPL and build the package
julia
Enter Pkg mode and precompile
]
activate .
build
Use the backspace to go back to the Julia REPL and start running Julia code!
Note Use the help for wrapper scripts to learn about available options in each wrapper function
For example, from a bash prompt:
julia --project=. ./workflow/scripts/ice-floe-tracker.jl extractfeatures --help
Here is the list of wrapper functions used in the CLI:
commands:
landmask Generate land mask images
preprocess Preprocess truecolor/falsecolor images
extractfeatures Extract ice floe features from segmented floe image
makeh5files Make HDF5 files from extracted floe features
track Pair ice floes in day k with ice floes in day k+1
optional arguments:
-h, --help show this help message and exit
Due to differences in multidimensional array representation between Julia and Python, special handling might be necessary.
import h5py
import pandas as pd
from os.path import join
dataloc = "results/hdf5-files"
filename = "20220914T1244.aqua.labeled_image.250m.h5"
ift_data = h5py.File(join(dataloc, filename))
df = pd.DataFrame(
data=ift_data["floe_properties"]["properties"][:].T, # note the transposition
columns=ift_data["floe_properties"]["column_names"][:].astype(str),
)
df.head()
import numpy as np
import matplotlib.pyplot as plt
ift_segmented = ift_data['floe_properties']['labeled_image'][:,:].T # note the transposition
ift_segmented = np.ma.masked_array(ift_segmented, mask=ift_segmented==0)
fig, ax = plt.subplots()
ax.imshow(ift_segmented, cmap='prism')