This codebase was developed as part of the COTS Control Innovation Program, project R-02. The overall purpose of this codebase is to clean, wrangle and perform geospatial analysis on control program data from GBRMPA to produce a standardised output for utilisation in research or decision support tools. source.R
defines a large set of functions for this and generally serves one of the following purposes:
Data Transformation
Error Checking & Processing
Site Assignment
to control data if applicableAggregation and Export
Functions defined in source.R
are then utilisesd to produce several application specific reusable workflows
process_control_data_research_output.R
See Section 3.1ingest_control_data_export_to_app.R
See Section 3.2This section defines several terms utilized throughout the documentation to ensure clarity.
Legacy
or legacy data
: Refers to data previously processed by this workflow or previous versions of this work flow and is in the "Legacy format".New
or new data
: Refers to new data exported from GBRMPA and is attempting to be processed by this workflow for the first time.Match
: A row identified in both legacy data
and new data
. Discrepany
: A match that has minor changes due to QA or mistakes between legacy
and new
data.This codebase is designed to be automated with Azure and not run locally.
Docker containers have been produced during the development of this code base to ensure that the client environment remains consistent with the dev environment, see section 2.1 for instructions. See section 2.2 & 2.3 for details of all packages installed in dev environment. There is no guarantee that the docker images are up-to-date.
This requires Docker version 24.0.6:https://www.docker.com/products/docker-desktop/
Although not recommended, the scripts can be executed locally after running the setup scripts. On windows execute the setup and dependencies batch files. On Linux run the setup and dependencies shell files
Package | Version |
---|---|
tools | 4.2.1 |
installr | 0.23.4 |
readxl | 1.4.1 |
sets | 1.0-21 |
XML | 3.99-0.13 |
methods | 4.2.1 |
xml2 | 1.3.3 |
rio | 0.5.29 |
dplyr | 1.0.10 |
stringr | 1.4.1 |
fastmatch | 1.1-3 |
lubridate | 1.8.0 |
rlang | 1.1.0 |
inline | 0.3.19 |
purrr | 0.3.4 |
jsonlite | 1.8.7 |
sf | 1.0-14 |
leaflet | 2.1.2 |
raster | 3.6-23 |
terra | 1.7-39 |
units | 0.8-0 |
tidyverse | 1.3.2 |
tidyr | 1.2.0 |
lwgeom | 0.2-13 |
stars | 0.6-4 |
stringr | 1.4.1 |
furrr | 0.3.1 |
foreach | 1.5.2 |
doParallel | 1.0.17 |
DBI | 1.1.3 |
This R code defines a data processing pipeline that imports, formats, and verifies control data for research purposes. This process creates a metadata report to document pipeline outcomes. The main()
function is the entry point of the pipeline. It takes as inputs the paths to the legacy data, new data ,KML data, and JSON configuration file. The control program data can then be assigned to the nearest cull sites.
While an ideal scenario would involve a fully dynamic system capable of automatically determining mapping transformations from one version of a data set to the next, this proved unattainable due to the overlapping use of names in the new GBRMPA database with the old data set in a different context. To address this challenge, a compromise between modularity and robustness was sought. Instead of hard-coding numerous transformations, a solution was implemented using JSON configuration files to specify transformations which are then checked against the input with NLP techniques and dynamically changed to ensure semantic differences can still be effectively mapped. This approach allows for flexibility in handling future datasets. The configuration files mean that any dataset can specify a configuration file and then utilise the work flow to ensure consistent data output.
Error checking is independent of discrepancy detection. These functions interpret the data and are flagged as errors is they are likely to be inappropriate for use in analysis based on advice from Dr Cameron Fletcher. No data is ever removed.
Discrepancy Detection provides the opportunity to identify changes in a specific row of data. It is not possible to know if a change is a mistake or QA so any changes that alter an error free data point to one containing an error, the original row will be utilised. In all other situations the new row will be utilised.
Latitude or Longitude
exceeds allowable rangeCOT Scars
are not one of the agreed upon categorical optionsTow Date
is missing and cannot be estimated from other entries of the same voyage Macroalgae
not one of the agreed upon categorical optionsBleach Severity
not one of the agreed upon categorical optionsDescriptive Bleach Severity
not one of the agreed upon categorical optionsPercentages
are not numeric between 0 and 100NA or Null
values are present in non-exempt columns. Non-exempt columns are those required to be created by the work flow process (Do not exist in GBRMPA database) and the ID column. Integers
that are not positiveCoral Cover
is not one of the agreed upon categorical options or value close enough to be mapped to the correct optionReef Label / Reef ID
is not in an accepted format. The latitude bounds currently set with regex are between 10 and 29 degrees south.Voyage Dates
is missing and cannot be estimated from other entries of the same voyageDuplicates
of any row more than two instances. It is plausible for two genuine distinct identical rows to exist so these are not flagged The workflow will not utilise the ID column to determine discrepancies. Instead distance between rows will be established based on comparison and differing number of columns
: Throughout development and historically it has been seen that the IDs in the database exports frequently change. It would be a large point of error if the IDs were treated as authoritative when it can not be guaranteed that they are. The functionality to do this has been programmed for a time when the IDs can be considered authoritative.The "Distance" is the maximum number of columns in a given row that can change between the legacy and new data, and still be considered a discrepancy. This was set to three. Anything greater than this and it is assumed that the rows are not related
: There is no correct choice for distance. Three was able to captured all rows from the 6000 row legacy data set in the 115,000 row new data set. It was determined that being conservative is more beneficial as there is not a significant consequence for interpreting a discrepancy as a new row. It will still be utilised provided no errors are flagged in accordance with above.A set of columns are excluded row comparions. This includes the ID column and any columns that are created in the workflow
: ID is excluded as the frequent changes may make a row appear closer or further away from another. New columns created by the workflow indicate have two cases:
Assigning sites to raster pixels is only performed to new reefs or altered reefs where possible. The workflow does not attempt to maintain cull sites and their assignment for reefs that are removed by GBRMPA
: The overarching philosophy of this workflow asserts that the deliberate decisions made by GBRMPA set the standard and should be regarded as authoritative.The method traditionally employed for the assignment of control data observations to specific geographical regions was valuable for understanding ecological patterns across various reef environments. However, the method's initial implementation relied on a Mathematica script, which introduced challenges of accessibility due to the proprietary nature of Mathematica software. This limitation not only hindered the wider adoption of the technique but also raised concerns about long-term sustainability and data processing bottlenecks. To overcome these hurdles and enhance the method's usability, we undertook the task of reconstructing the approach with open-source programming language R. This transformation aims to render the method more accessible, enabling researchers to employ it without the constraints posed by proprietary software. Our reimagined implementation closely follows the original approach, allowing us to efficiently process observations and alleviate potential bottlenecks associated with external dependencies, ensuring a more streamlined data analysis workflow. The R implementation of Dr Cameron Fletcher's site assignment was the accurate method for site assignment out of those tested.
Steps were then taken to reduce the computational complexity of the calculations through the simplification of the intricate polygonal shapes. The process implemented Ramer-Douglas-Peucker algorithm to obtain an adaptive approximation of a complex polygons while maintaining their essential characteristics based on a predetermined threshold of $10^{-5}$.
The bounding boxes of each reef layer are extended by 0.003 degrees, roughly equivalent to 300 meters. The initial objective is to ensure that the bounding boxes encompass the entirety of the reef polygons, incorporating a buffer zone of suitable dimensions. This buffer serves the purpose of accommodating the meandering trajectory of manta tows, which tend to fluctuate in proximity to the reef margins. Achieving a delicate equilibrium, the buffer must be substantial enough to avoid overlap between reefs and to capture most manta tows, while avoiding computational overload. The approach also seeks to align with the practices of GBRMPA (Great Barrier Reef Marine Park Authority), wherein manta tows are assigned to sites based on proximity conditions. To maintain fidelity with the GBRMPA framework, the buffer is set at 0.003 degrees, a value that ensures consistency in proximity while retaining computational efficiency.
The expansion of the bounding boxes is coupled with an iterative process of rasterization, resulting in a raster for every reef layer. These rasters can be used for subsequent spatial analyses if desired.
To calculate the distance between a point and a polygon, the function st_distance
from the sf
package was utilized and can perform the calculation with either Euclidean or great circular distance. Euclidean distance will be utilized for comparison but accuracy can be improved in future implementations with the use of great circular distance. Nothing in the code or documentation indicated that the assignment of a pixel was dependent on the assignment of any other pixel. The assigned rasters undergo a transformation, yielding a set of rasters, each corresponding to a distinct reef.
Manta tow centroids are transformed into point representations. Iterating through the set of rasters, the tow points are filtered based on the reef name of the raster. The value of the raster at each centroid point is extracted and the results merged with the manta tow data input.
Output locations are defined in the configuration files and will be created if they do not already exist. Any output will be saved with the naming convention: Keyword
%Y%m%d
%H%M%S
.`File extension. Do NOT remove data outputs, simply take a copy. Previous outputs are utilised to reduce processing and reduce errors.
This R code defines a data processing pipeline that ingests JSON exports from GBRMPA owned PWAs then formats, verifies and exports the data for utilization in the Cots Control Centre Decision Support Tool. The main()
function is the entry point of the pipeline and requires a list of JSON files to ingest, a path to the config file, and a connection string to connect to the database. This workflow was produced so that previous
Configuration files should not be altered, instead new alternative configuration files should be produced. Config files exist for both workflows that specify expected column transformations, new columns required, their default values and data types. Other config files exist to map database column names to research output column names to reuse aspects of the codebase.
main(new_path, configuration_path, kml_path, leg_path)
leg_path
: path to the legacy data filenew_path
: path to the new control data fileconfiguration_path
: path to the control data specific configuration file kml_path
: path to kml file containing all cull sites on the reefimport_data(data, index=1)
data
: file path to the file containing data desired to be in dataframe formatindex
: Index of the Excel sheet to import get_datetime_parse_order()
contribute_to_metadata_report(data, key="Warning")
data
: Matrix or dataframe containing strings that describe the location and warning/error that occurredkey
: an optional parameter specifying the node to be inserted under. Warning by default, any string is valid.get_vessel_short_name(string)
string
: A string or vector of strings that are vessel names. get_file_keyword(string)
string
: A string. append_to_table_unique(con, table_name, data_df)
con
: A database connection object from DBI package.table_name
: Name of table to append data as string. Case sensitive. data_df
: Dataframe to append. get_id_by_cell(con, table_name, search_column, search_term)
con
: A database connection object from DBI package.table_name
: Name of table to append data as string. Case sensitive. search_column
: Name of column in database to check for value as string. search_term
: Value to search for in database. get_id_by_row(con, table_name, data_df)
con
: A database connection object from DBI package.table_name
: Name of table to extract IDs. Case sensitive. data_df
: Dataframe to perform left join with. get_voyage_dates_strings(strings)
strings
: A vector of stringsget_app_data_database(con, control_data_type)
con
: A database connection object from DBI package.control_data_type
: A string key word to indicate type of control data. separate_date_time(date_time)
date_time
: A vector of datetimes.get_reef_label(names)
names
: A vector of strings.get_start_and_end_coords_research(start_lat, stop_lat, start_long, stop_long)
start_lat
:A vector of initial latitude values of size n.stop_lat
: A vector of final latitude values of size nstart_long
: A vector of initial longitude values of size nstop_long
: A vector of final longitude values of size nget_start_and_end_coords_app(start_lat, stop_lat, start_long, stop_long)
start_lat
:A vector of initial latitude values of size n.stop_lat
: A vector of final latitude values of size nstart_long
: A vector of initial longitude values of size nstop_long
: A vector of final longitude values of size nget_start_and_end_coords_base(start_lat, stop_lat, start_long, stop_long)
start_lat
:A vector of initial latitude values of size n.stop_lat
: A vector of final latitude values of size nstart_long
: A vector of initial longitude values of size nstop_long
: A vector of final longitude values of size nget_feeding_scar_from_description(names)
names
: A vector of strings.get_worst_case_feeding_scar(scars)
names
: A vector of strings.get_coral_cover(coral)
coral
: A vector of strings.get_median_coral_cover(coral)
coral
: A vector of strings.missing_reef_information(data, columns, test_value = NA)
data
: dataframe to check.columns
: Vector of columns to check for missing information.test_value
: A vector of undesirable values indicating missing information that are not null or NA.assign_missing_site_and_reef(transformed_data_df, serialised_spatial_path, control_data_type)
data
: dataframe of control data.serialised_spatial_path
: Path to rds file containing regions assigned to sites.control_data_type
: Key word as a string indicating type of control data.site_numbers_to_names(numbers, reef_names)
numbers
: A vector of numeric or string types indicating the site.reef_names
: A vector of reef names as strings that correspond with the numbers provided. aggregate_culls_site_resolution_research(data_df)
data_df
: dataframe of control data.aggregate_culls_site_resolution_app(data_df)
data_df
: dataframe of control data.aggregate_manta_tows_site_resolution_app(data_df)
data_df
: dataframe of control data.aggregate_manta_tows_site_resolution_research(data_df)
data_df
: dataframe of control data.separate_control_dataframe(new_data_df, legacy_data_df)
new_data_df
: New control data exported from GBRMPAlegacy_data_df
: Control data that most recently passed through workflow. In legacy format.matrix_close_matches_vectorised
& vectorised_separate_close_matches
to determine the number of variations in a row from the original legacy output compared with the new input. Most likely matches are then determined based on number of variations.
Given that it is not possible to definitively know if a change / discrepancy was intentional or not both new and change entries will pass through the same validation checks and if passed will be accepted as usable and assumed to be. If checks are failed, the data will be flagged. Discrepancies flagged with errors are returned to their original state from the legacy dataset if the original state is not flagged as an error.separate_new_control_app_data(new_data_df, legacy_data_df)
new_data_df
: New control data exported from GBRMPAlegacy_data_df
: Control data that most recently passed through workflow. In legacy format.matrix_close_matches_vectorised
& vectorised_separate_close_matches
to determine the number of variations in a row from the original legacy output compared with the new input. Most likely matches are then determined based on number of variations. Only new entries are returned flag_duplicates(new_data_df)
new_data_df
: column names to mapnew_data_df
: dataframe with updated column "error_flag"compare_discrepancies(new_data_df, legacy_data_df, discrepancies)
new_data_df
: New control data exported from GBRMPAlegacy_data_df
: Control data that most recently passed through workflow. In legacy format.discrepancies
: mapped indices indicating likely matches between legacy_data_df
& new_data_df
with variations in a number of columns.output_df
: dataframe which contains original rows from legacy_data_df
in place of any likely errors in new_data_df
map_column_names(column_names)
new_data_df
: column names to mapset_data_type(data_df, mapping)
data_df
: A data frame to be updated.mapping
: dataframe that specifies a columns required data type. data_df
: The updated data frame.matrix_close_matches_vectorised(x, y, distance)
x
: data frame to search iny
: data frame to search againstdistance
: maximum distance from perfect for a match to be consideredX_index
- the index of the row in x
that matchedY_index
- the index of the row in y
that matchedDistance
- the distance between the matched rowsmatch_indices
containing the indices of the rows in x and y that have non-perfect matches within the specified distance. The function first pre-allocates memory for the match_indices matrix assuming the worst-case scenario of y_rows * x_rows possible matches. If this allocation fails due to insufficient memory, it tries again with a smaller allocation of 10,000,000 rows. Then, the function iterates through each row in x and compares it to every row in y in a vectoried manner. For each value in the row of x, the function evaluates whether it matches the corresponding value in the row of y. These logical values are then appended to the matches matrix. After iterating over every column, there will be a matrix of size (y_rows, x_cols), where each row represents a row in y and each column represents a column in x. A perfect matching row in y will have a corresponding row in matches exclusively containing TRUE. Given that TRUE is equivalent to 1, the rowSums function is used to determine the number of non-perfect matches. If this number is less than or equal to the specified distance, the row index, column index and distance from perfect match are stored in match_indices. The function uses a custom vectorised function store_index_vec to append matches to match_indices. Finally, the na.omit function is used to remove rows with NA values from match_indices before it is returned.%fin%
x
: size n*m vector y
: size a*b vector vectorised_separate_close_matches(close_match_rows)
close_match_rows
: A data frame containing the close matches between two data sets. It has 3 columns, where the first and second columns contain the row indices from the two data sets, and the third column contains the distances between these rows.Description:
The vectorised_separate_close_matches() function is used to separate the close matching rows between two data sets. This function separates the rows in a vectorized process, which involves using logical checks on vectors or matrices so Boolean operations can be used to separate the rows. This reduces processing time by two or three orders of magnitude, which is a worthy trade-off for the reduced readability. The function handles close matching rows in the following order:
This order matters, and it is not a definitive process. Therefore, the order needs to maximize the probability that a row from the new data set is matched with one from the previous data set. One-to-one matches and many-many perfect matches are the most likely to be correct and, therefore, are removed first. It is important that the next matches handled are one-many. This is to ensure a match is found for the "one," as its most likely match is one of the many with the smallest distance. Any rows with those indices can then be removed to prevent double handling. Once many-many rows have been handled, one-one or one-many relationships may have been formed and therefore can be handled repetitively until all matches have been found or no more can be found.
rec_group(stack, m2m_split, groups, group)
stack
: a vector containing the data to be groupedm2m_split
: a list containing the index of the matching rowsgroups
: a list of matrices containing grouped datagroup
: an integer indicating the group numberverify_RHISS(data_df)
data_df
: a data frame containing control data data_df
dataframe with updated column "error_flag"data_df
after altering a column called "error_flag"verify_voyage_dates(data_df)
data_df
: a data frame containing control data data_df
dataframe with updated column "error_flag"verify_percentages(data_df)
Inputs:
data_df
: a data frame containing control data Outputs:
data_df
dataframe with updated column "error_flag"Description:
data_df
after altering a column called "error_flag"
(data_df)`Inputs:
data_df
: a data frame containing control data Outputs:
data_df
dataframe with updated column "error_flag"Description:
data_df
after altering a column called "error_flag"verify_na_null(data_df)
data_df
: a data frame containing control data data_df
dataframe with updated column "error_flag"data_df
are NA or NULL, and flags those rows as invalid by adding a "TRUE" value to the "error_flag" column. verify_integers_positive(data_df)
data_df
: a data frame containing control data data_df
dataframe with updated column "error_flag"data_df
after altering a column called "error_flag"remove_leading_spaces(data_df)
data_df
: a data frame containing control data data_df
modified dataframeverify_coral_cover(data_df)
data_df
: a data frame containing control data data_df
dataframe with updated column "error_flag"data_df
after altering a column called "error_flag"verify_cots_scars(data_df)
data_df
: a data framedata_df
dataframe with updated column "error_flag"data_df
after altering a column called "error_flag"verify_cohort_count(data_df)
data_df
: a data framedata_df
dataframe with updated column "error_flag"data_df
after altering a column called "error_flag"find_one_to_one_matches(close_match_rows)
close_match_rows
: a data frame or matrix with columns "x_index", "y_index" and "difference". This is the output of function matrix_close_matches_vectorised
.
verify_entries(data_df, configuration)
data_df
: Data frame containing control data to be verified.configuration
: A configuration object containing metadata required for verification, including ID_col
and control_data_type
.data_df
: Updated data frame with error flags added.verify_lat_lng(data_df, max_val, min_val, columns, ID_col)
data_df
: Data frame containing control data.max_val
: Maximum value for latitude or longitude.min_val
: Minimum value for latitude or longitude.columns
: Columns to verify (e.g., c("Longitude", "Start Lng", "End Lng")).ID_col
: Identifier column.data_df
: Updated data frame with error flags added.verify_scar(data_df)
data_df
: Data frame containing control data.data_df
: Updated data frame with error flags added.verify_tow_date(data_df)
data_df
: Data frame containing control data.data_df
: Updated data frame with error flags added.get_new_field_default_values(data_df, new_fields)
data_df
: Data frame containing control data.new_fields
: JSON object containing information about the new field and its default valuedata_df
: Updated dataframe with default values in specified columns.transform_data_structure(data_df, mappings, new_fields)
data_df
: Data frame to be transformed.mappings
: Data frame specifying mappings for existing columns.new_fields
: Data frame specifying new fields to be added.transformed_df
: Transformed data frame.assign_nearest_site_method_c
data_df
: Data frame containing observations.kml_path
: Path to the KML file containing reef polygons.keyword
: Keyword used in file naming convention.kml_path_previous
: Path to the KML file containing reef polygons used in last iteration of workflow (optional).serialised_raster_path
: Path to a serialised raster with pixels assigned to sites from previous iteration of workflow (optional). spatial_output_path
: Path to output serialised raster produced (optional).spatial_path
: Path to the serialized spatial data file (optional).raster_size
: Size of the raster cells. Can specify resolution with a value less than 1 or specify the pixel length of the raster
extent.x_closest
: Assign nth closest site to point. Typically, in production, only the closest site is required; however, during development, multiple closest sites can be beneficial for analysis.is_standardised
: Flag indicating whether to standardize extents to the largest one in the provided data.save_spatial_as_raster
: Flag indicating whether to save the generated spatial data as individual raster files for analysis with traditional geospatial program such as Archgis and QGIS (optional).updated_pts
: Updated data frame with the nearest site information added.get_centroids(data_df, crs, precision=0)
data_df
: Data frame containing control data.crs
: Coordinate Reference System.precision
: Decimal places for rounding coordinates.pts
: Spatial points representing the centroids of manta tows.find_recent_file(directory_path, keyword, file_extension)
directory_path
: Path to the directory where files are located.keyword
: Keyword used in file naming convention.file_extension
: File extension of the desired files.NULL
if no matching file is found.NULL
if no matching file is found.save_spatial_as_raster(output_path, serialized_spatial_path)
output_path
: Path where raster files will be saved.serialized_spatial_path
: Path to the serialized spatial data file.get_spatial_differences(kml_data, previous_kml_data)
kml_data
: List of spatial data for the current version.previous_kml_data
: List of spatial data for the previous version.compute_checksum(data)
data
: Data for which the checksum needs to be computed.digest
function.assign_raster_pixel_to_sites_parallel(kml_data, layer_names_vec, crs, raster_size, x_closest=1, is_standardised=0)
kml_data
: List of spatial data.layer_names_vec
: Vector of layer names.crs
: Coordinate reference system.raster_size
: ster cells. Can specify resolution with value less than 1 or can specify the pixel length of the raster extent. (default is 0.0005)x_closest
: Assign nth closest site to point (default is 1).is_standardised
: Flag indicating whether to standardize extents (default is 0).assign_raster_pixel_to_sites_single(raster, site_poly, crs, x_closest)
raster
: Raster object.site_poly
: Spatial data for sites.crs
: Coordinate reference system.x_closest
: Assign nth closest site to point.assign_raster_pixel_to_sites_non_parallel(kml_data, layer_names_vec, crs, raster_size, x_closest=1, is_standardised=0)
kml_data
: List of spatial data.layer_names_vec
: Vector of layer names.crs
: Coordinate reference system.raster_size
: Size of the raster cells. Can specify resolution with value less than 1 or can specify the pixel length of the raster extent. (default is 0.0005)x_closest
: Assign nth closest site to point (default is 1).is_standardised
: Flag indicating whether to standardize extents (default is 0).assign_raster_pixel_to_sites(kml_data, layer_names_vec, crs, raster_size, x_closest=1, is_standardised=0)
kml_data
: List of spatial data.layer_names_vec
: Vector of layer names.crs
: Coordinate Reference System.raster_size
: Size of the raster cells. Can specify resolution with value less than 1 or can specify the pixel length of the raster extent.x_closest
: Assign nth closest site to point. Typically in production only the closest site is required, however for research purposes during development it was beneficial for analysis. is_standardised
: Flag indicating whether to standardize extents to the largest one in data provided.site_regions
: List of rasters with assigned nearest site values.site_names_to_numbers(site_names)
site_names
: Vector of site names.simplify_reef_polyogns_rdp(kml_data)
kml_data
: List of sf data frames for (reef), read from KML filesimplified_kml_data
: List of sf data frames for (reef), read from KML file, with fewer boundary points that form individual polygonspolygon_rdp(polygon_points, epsilon=0.00001)
polygon_points
: Matrix of polygon points.epsilon
: Tolerance parameter for simplification.rdp(points, epsilon=0.00001)
points
: Matrix of points.epsilon
: Tolerance parameter for simplification.perpendicularDistance(p, A, B)
p
: Point coordinates.A
: Start point of a line segment.B
: End point of a line segment.result
: perpendicular distancesimplify_kml_polyogns_rdp(kml_data)
kml_data
: KML data as a list.simplify_shp_polyogns_rdp(shapefile)
shapefile
: shapefile data.find_largest_extent(kml_data)
kml_data
: List of sf data frames for (reef), read from KML fileresult
: largest extent standardise_extents(kml_data)
kml_data
: List of sf data frames for (reef), read from KML fileresult
: List of sf data frames for (reef), read from KML file. create_raster_templates(extents, layer_names_vec, crs, raster_size=150)
extents
: List of extents to create raster from. layer_names_vec
: Vector of reef namescrs
: Coordinate Reference System.raster_size
: Size of the raster cells. Can specify resolution with value less than 1 or can specify the pixel length of the raster extent.result
: List of valueless rasters for specified, CRS, extent and resolution. rasterise_sites(kml_data, is_standardised=1, raster_size=150)
kml_data
: List of sf data frames for (reef), read from KML fileis_standardised
: Vector of reef namesraster_size
: Size of the raster cells. Can specify resolution with value less than 1 or can specify the pixel length of the raster extent.rasterise_sites_reef_encoded(kml_data, layer_names_vec, is_standardised=1, raster_size=150)
kml_data
: List of sf data frames for (reef), read from KML fileis_standardised
: Vector of reef namesraster_size
: Size of the raster cells. Can specify resolution with value less than 1 or can specify the pixel length of the raster extent.xth_smallest(x, x_values)
x
: vector of values to sortx_values
: Data frame showing the key value pairs. Nth_smallest_value:Value. contribute_to_metadata_report(key, data, parent_key=NULL, report_path=NULL)
key
: The key representing the node to be inserted.data
: Data to be added to the metadata report.parent_key
: Optional parent key under which the data should be inserted.report_path
: Optional path to the metadata report file. If not provided, the function will attempt to find the most recent report file.update_config_file(data_df, config_path)
data_df
: Data frame containing observations.config_path
: Path to the JSON config file.map_new_fields(data_df, new_fields)
data_df
: Data frame containing observations.new_fields
: Data frame containing information about new fields to be added.new_fields
data frame.map_all_fields(data_df, transformed_df, mappings)
data_df
: Data frame containing observations.transformed_df
: Data frame to which the fields will be mapped.mappings
: Data frame containing mapping information.map_data_structure(data_df, mappings, new_fields)
data_df
: Data frame containing observations.mappings
: Data frame containing mapping information.new_fields
: Data frame containing information about new fields to be added.transformed_df
: Transformed data frame with mapped fields.extract_dates(input)
input
: Vector of character strings representing file or directory names.date_objects
: Vector of date-time objects extracted from the input.