AFM-SPM / TopoStats

An AFM image analysis program to batch process data and obtain statistics from images
https://afm-spm.github.io/TopoStats/
GNU Lesser General Public License v3.0
56 stars 10 forks source link

TopoStats RoadMap #636

Open ns-rse opened 1 year ago

ns-rse commented 1 year ago

Central place for meeting 2023-08-04 09:30-12:40

What

This meeting is to order, organise and re-prioritise projects, features and/or issues within TopoStats so we are all clear on what to do if new grants change the focus of the software or if we have new joiners who want to get stuck in.

Why?

While we have Milestones and the Issues project board, it seems that our priorities are flexible and change which don't seem to reflect in our current management tools.

How

The new idea is to make more use of the tags and align these to objectives within grants and collaborations etc and to have a 3-binarised (trinarised?) system of "alignment" to help prioritise these.

Sylvia also has a big list of TopoStats long term improvements / goals too to add.

Agenda

The morning will probably be split into:

  1. The tools we want to use
  2. How the prioritisation is going to work
  3. Writing the targets down
  4. How we add / maintain these (as I feel this could be where we failed before)
ns-rse commented 1 year ago

My thoughts on Tools and prioritisation (based on having completed some training on using Agile for software project management).

Tools

Two options I'm aware of..

Other alternatives (mostly paid solutions)...

There are lots out there, I wouldn't want to spend masses of time searching for the perfect tool (no such thing exists, everyone has different experiences and therefore expectations of software), would rather spend a little time learning how to use them.

Priorisation

Priority is a tricky one, there are a few angles as I see it...

How to prioritise?

In Agile there are several different methods, the list of outstanding tasks and feature requests forms what is known as a Backlog and tasks in the backlog are prioritised based on the value and effort required to complete them. Large tasks which will take many weeks are broken down into smaller tasks and should be addressed in sprints which are typically two week periods for completing work (this may be extended given most involved are not working solely on TopoStats and have other tasks and responsibilities to complete).

How to estimate effort depends on having a feel for how long it takes to do things which can be tricky, if project management of issues has been done well you can get an estimate from Burn-up charts (and Projects includes some tools for this but they're not great I've not sussed them out yet).

There are a few techniques for effort estimation though, here are some notes I have from my Agile course.

Planning Poker

Useful if backlog is small (<10). Everyone has cars with Fibonacci sequence on and a question mark. As each backlog item is discussed team members lay a card face down on the table for their estimation of effort, at the end reveal and discuss if they are far apart to form a consensus.

Dot Voting

The Bucket System

Large/Uncertain/Small

Ordering Method

Affinity Mapping

T-Shirt sizes

Story Points

Characteristics of effective estimation

Best Practices

How we add / maintain these

Somewhat contingent on the tool that is used but it should work with GitHub regardless, that is where we track our issues and pull requests.

If a clear Milestone is defined and agreed upon with timelines and Sprints to complete the work then typically new feature requests should go to the back of the backlog unless they directly impact on a given feature that is under development. This helps the Milestone be reached in the expected timeframe.

Suggestions from Neil

Modularisation

@SylviaWhittle has done an amazing job setting up this framework but I think we need to shift over to it completely so that each step in processing can be run independently. This requires a few additional things.

Napari Plugin

Napari is "a fast, interactive, multi-dimensional image viewer for Python. It's designed for browsing, annotating, and analyzing large multi-dimensional images. It's built on top of Qt (for the GUI), vispy (for performant GPU-based rendering), and the scientific Python stack (numpy, scipy).".

I think making output from TopoStats viewable in Napari would be really useful.

Maintain Good Test coverage

It's important that we ensure the code that goes into the main branch does what we think it should. The way to do this is to ensure we have good test coverage. This is important because it a) ensures that the code does what we think it should; b) protects our future selves to capture breaking changes.

Unit tests should ideally be based on abstraction and not rely heavily on any one single sample.

When we have bug reports these should form the basis of new tests so that we ensure we've captured this.

Maintaining Documentation

As usage and features change we should ensure that documentation is kept up-to-date. This could be done by including a template of checklist items for Pull Requests that includes updating documentation as well as docstrings for functions/classes/methods and other minimums that we require when merging to main (e.g. passes all tests).

Notebooks are a nice medium for demonstrating how to use the tools interactively but do require a fair bit of maintenance and also require people to be familiar with running them locally.

We might want to consider using Web Assembly Framework to provide interactive Notebooks using JupyterLite/Pyodide. These can also serve as the basis for a WebUI for processing.

Improving Git Workflow

I think we can make our lives a bit easier if we try and improve our Git workflow. This ties in with having clear Milestones and breaking larger tasks/issues (termed Epics) down into smaller tasks that can be achieved in sprints.

git commit's should be atomic and capture a single task. These may themselves be the result of several commits but these can be squashed via ether git commit --amend or git rebase --interactive to combine multiple commits into a single commit once all the steps of work have been completed. It will make the history easier to read and makes Pull Requests easier to review. Useful resource on this is Git Legit and related material such as How to write a Git Commit Message.

How to Handle Support Requests

We're starting to see questions from users, both within the group and outside. My preference would be for all support requests to occur through GitHub Discussions. Slack history is ultimately lost on the free tier, whilst the Discussions are more permanent. They also serve to become a repository of searchable information that others can search first. Further over time as questions get repeated and answers/solutions accrue it can be used as a seeding ground for documentation.

Long Term View - Laboratory Information Management System (LIMS)

I really liked @alicepyne proposal to standardised meta-data about experiments conducted, this rung some bells in my head at the time about the possibility of developing a Laboratory information management system to organise the data that is generated by users and labs. A major advantage of this would make it much easier for people who manage large teams to find and use data across studies. TopoStats would I hope fit into this by automatically processing scans so that users got clean images automatically. This would likely require some tinkering to find useful default values for different sample types but would potentially be very useful.

Having a WebUI interface for users to log into and view the results, tweak processing parameters and have the results regenerated all the time with data being stored and managed in a back-end database would be very useful I think.