BIDS / field-guide-to-computational-science

2 stars 0 forks source link

Outline for the guide #1

Open choldgraf opened 7 years ago

stefanv commented 7 years ago

https://www.dropbox.com/s/oku7bhoucgvdpic/Field%20Guide%20to%20Data%20Science.mm?dl=0

This mindmap can be opened with freemind (brew install freemind)

choldgraf commented 7 years ago

want to store this in the software WG google drive folder so that it syncs automatically? Or use an online service rather than freemind?

stefanv commented 7 years ago

Do you know of a good one?  I looked around quickly, but didn't see anything obvious.

choldgraf commented 7 years ago

https://chrome.google.com/webstore/detail/coggle-collaborative-mind/hbcapocoafbfccjgdgammadkndakcfoi?hl=en-GB

?

choldgraf commented 7 years ago

@stefanv where is that text outline that we put together? Can you push that to the repo so that we can start iterating on that as a TOC?

stefanv commented 7 years ago

The mindmap is the hierarchical version of that. I will re-map into text format.

choldgraf commented 7 years ago

ah yep, just saw that...I moved it into a google drive that I put in the BIDS fellows folder. That folder is here:

https://drive.google.com/drive/folders/0B8VZ4vaOYWZ3bGg5QzlBZWJZZVU?usp=sharing

stefanv commented 7 years ago

Original:

*** Field Guide to DS
***** A quick guide to organizing computational biology projects
***** Structure
      * Data organization (data formats & where to store, scripts vs
        interactive, scripts go with data, versioning data,
        intermediate data, etc.)
      * Reproducibibility (software, papers)
      * Revision control
      * Continuous integration
      * Software documentation
      * Truthful visualization (colormaps, elements of graphics,
        misleading plots, etc.)
      * Managing large datasets
      * Choosing a language
      * Communication
      * Organizing a lab
      * Managing computational science projects (for broader use)
      * Data scaling challenging (in-memory, out-of-memory,
        parallelization, clusters, etc.)
      * Pre-publication
      * Exploratory analysis (keeping track of what you try, learning
        focused exploration, breaking up exploration into chunks, etc.)
      * Open vs closed publishing
      * Licensing
      * Scoping a project (realistic expectation + time estimates)
      * How to collaborate on GitHub / contribute to existing packages
        (perhaps section on getting your feet wet)
      * Resources for finding answers to questions; how long do you
        keep trying before asking / looking elsewhere
      * Resources: storing data, code, doing computations, public
        clusters, etc.
      * Sharing your work in public: figshare, open publication,
        how to publish a dataset, how to publish software (licensing)
      * How to re-use other people's work (licensing, forking,
        contributing back, etc.)

      * Scientific workflows
      * Managing a project and working with people
      * Data curation
      * Software development
        * Virtual environments

******* Broader principles
******* Reproducibility
******* Software
choldgraf commented 7 years ago

I think the mindmap is a good start for now actually...I think the next question is where are we missing information, and what will we prioritize for inclusion in a "v1" version of this.

choldgraf commented 7 years ago

Maybe we can take a week to add topics to this mindmap (the one in the gdrive) as we see fit, and then meet next week to make a "first cut" of topics to include?

choldgraf commented 7 years ago

ps I just set up a gitter for this project, tho we could use slack instead if you prefer. WDYT?

stefanv commented 7 years ago

Shall we put the following in the doc, and then just iterate on it there?

  - Overarching themes:
    - Reproducibility
      - Software
      - Papers
      - Provenance Tracking
  - Topics
    - Data Management & Organization
      - Data Versioning
      - Data backup & replication
      - Data access
        - Databases
        - Online storage
        - S3
      - Data/computation scaling
        - In / out-of-memory
        - Parallelization
        - Clusters
    - Curation
    - Cleaning
  - Software
    - Revision control
    - Continuous integration
    - Choosing the right language
    - Licensing, re-use, and attribution
    - Contributing to existing projects (see also GitHub collaboration)
    - Virtual Environments
  - Representation
    - Visualization
      - Elements of Graphics
      - Misleading plots
      - Colormaps
      - Effective plots
  - Communication
    - Organizing a lab
    - Online communications
    - Managing computation science projects
    - GitHub collaboration
    - Working with people
  - Publication
    - Pre-publishing
    - Open access
    - Indexing, identification, DOI, ORCID, etc.
    - Figshare and other sharing platforms
  - Experimentation / research planning
    - Tracking
    - Learning focused collaboration
    - Chunking work
    - Scoping an entire project
    - Finding help
      - Where to find help
      - How long to wait before
    - Research Workflows
stefanv commented 7 years ago

Re: slack/gitter, I am on both, although Slack notifications are more visible on Android.

stefanv commented 7 years ago

Advantage to Gitter: others can join us from outside.