-
## Context
We are hoping to automatically ingest our datasets in from sources (when possible and appropriate). This task is to do data quality validation to identify existing issues, and handle an…
-
**Goal**: develop a series of scripts to:
1. fetch the data from FLHSMV (received in the form of CDs)
2. filter for events within Leon County
3. remove rows with missing lat/lon (and send that sub-…
-
Create a basic data preprocessing pipeline for a specific bioinformatics dataset to prepare it for LLM training. The pipeline should include steps for data cleaning, tokenization, and formatting
-
- Incoming organisation names should be normalised (e.g. organisation names should have all words capitalised except some known stop words)
- lookups in canonical list of known name variants, and swit…
-
Hi there,
Just stumbled upon this project: looks super-useful for reproducible science! I'm currently a collaborator on [PyPREP](https://github.com/sappelhoff/pyprep) (an MNE-Python reimplementatio…
-
Hi,
I am getting an error in the test colab notebook (v0.1.153): https://colab.research.google.com/github/Cocoon-Data-Transformation/cocoon/blob/main/demo/Cocoon_Stage_Demo.ipynb
I did the dat…
-
### Overview
We need to create a data cleaning pipeline that takes in raw input data from the Socrata API and updates the Google Cloud Platform database with the correctly formatted geospatial data.
…
-
### Overview
We need to create a data cleaning pipeline that takes in raw input data from the Socrata API and updates the AWS database with the correctly formatted geospatial data
### Action items…
-
## **Pipeline**
1. Download
1. Randolph Glacier Inventory (RGI) 5.0 Complete
2. MERRA-2 Aerosol Raster Modeled Data
3. CALIOP Aerosol Raster observation Version 3 Aerosol Profile Data
4. …
-
## Motivations
Data library is a great starting point for the "extract" portion of dcpy, but there are multiple ways its not meeting our needs. Our main area of focus is data quality, both on the p…