calacademy-research / COMB

Apache License 2.0
4 stars 4 forks source link


C ombined O ccupancy M odels for B irds

To sift through sound to identify birds and predict their occupancy!

Repository Structure (draft):

Considerations on Mixed Computation for Collaboration

Goal: Collaboration with groups used to different computational tools and approaches.

Issues for development and production:

This way we can adopt GitHub for code and use Google Drive / Sheets for data (Academy has infinite Google space).

Proposed structure

File or Directory Description This markdown document: start here for orientation to the project space
\/chunk_x\/ coding each discrete 'chunk or step' in the \/chunk_x\/ directories as follows:
..\/src all code: R, Python, Awk, etc. scripts for reading in, cleaning, exploring, and analyzing data
..\/input all raw data retrieved from Google Drive (large files) or Google Sheets ('by hand' metadata).
..\/output all digested data (for next steps) as well as figures, tables, etc. for reports, manuscripts, presentations
..\/note all notebook analyses (jupyter, R-notebooks, Markdown docs, metadata summaries)
..\/hand all by-hand step descriptions how to reproduce (readme_by_hand.txt, with links to tools)
..\/models copies of .txt files of occupancy models (used in JAGS) not this can be moved to \manuscript\ eventually
\/chunk_y\/ ...

This proposal borrows shamelessly from the Human Rights Data Analysis Group's Patrick Ball. If you have time, you can watch his YouTube video on 'Principled Data Processing', where he explains his rationale on how to organize computational work for ‘self-documenting’ reproducible science.

Whatever we decide on, it will be great if our pipeline process self-documents with 'stepwise' inputs, code and outputs in the directory hierarchy and intermediate files are saved and automatically archived and work can be restarted from any step.

Eventually we may combine these in a shared computing environment that everyone understands, perhaps using JupyterLab for notebooks that can collate our results. If we do that, we might want to create a stable set of tools for a mixed R/python environment using Conda so we are all on the same page.

Thoughts welcome, meanwhile we will start to populate the git with some of our code that points to data and processes it.