csv processing - Githubissues

DATA PREPROCESSING

acquiring, cleaning, formatting, and analyzing the data to ensure it is usable for our downstream machine learning tasks

1. Data Acquisition Step

The data acquisition process for this project involves extracting meaningful metadata from filenames and the corresponding audio files that capture bird migration calls. Here’s how we handle it:

Extract Metadata from Filenames:

filenames contain encoded information, such as location, frequency ranges, species identifiers, and date/time data. We parse these filenames to extract the relevant metadata programmatically.

Example Filename Format: 2459626.192622_Tautenburg___6589-9171kHz___10-10.9s___b.wav

Example Parsed Fields:

julian-date | location | low_freq | high_freq | start | end | species

GOAL: Automate the extraction of this metadata to store it in a structured format (CSV or database), which will be used for further analysis and model training.

2. Data Clearance

The data collected contains various issues that need to be addressed before it can be used for analysis. These issues include faulty or missing information, particularly errors in frequency data, misformatted filenames, and missing metadata.

Handle File Parsing Errors:

Inconsistent filename separators (e.g., - instead of _) are handled by programmatically replacing incorrect characters.
Missing or malformed parts of the filename (such as missing frequency data or species codes) are flagged for manual correction or filled with default values.

GOAL: Ensure all filenames adhere to a consistent structure and all necessary metadata is extracted and corrected

SR-71-group / birdanalysis

csv processing #2

DATA PREPROCESSING

1. Data Acquisition Step

2. Data Clearance