TopoStats RoadMap - Githubissues

My thoughts on Tools and prioritisation (based on having completed some training on using Agile for software project management).

Tools

Two options I'm aware of..

GitHub Projects, we have one already in-place for TopoStats see here. This has very tight integration with GitHub and the attendant Issues/Pull Requests and works nicely in my opinion. We can create Milestones and add issues to these, the table view can be filtered based on different criteria and there is a Kanban style view onto the project if required.
Asana is another tool for doing this and it will integrate with GitHub. It costs beyond the 30-day free trial though and I'm mindful of people having to learn yet another framework/set of tools just to get things done.

Other alternatives (mostly paid solutions)...

Project Manager
ClickUp fairly well featured free tier.
Trello free tier, good for Kanban boards.
Nuclino purports to be simple.

There are lots out there, I wouldn't want to spend masses of time searching for the perfect tool (no such thing exists, everyone has different experiences and therefore expectations of software), would rather spend a little time learning how to use them.

Priorisation

Priority is a tricky one, there are a few angles as I see it...

Bug reports - problems that are genuine bugs where there are inaccurate or unexpected results, these typically need fixing quickly.
Feature Requests - these are typically users requested features, many are small and subtle (e.g. #615 which asks that users can specify scaling too).
New Features - these are different from feature requests in that they are likely tied to an individuals research or specific grant funding.
Maintainability - As the software grows we have to be mindful of avoiding things growing into a monolithic architecture where everything is inter-dependent. This requires careful consideration of the software architecture and how things fit together and should be constantly reviewed.

How to prioritise?

In Agile there are several different methods, the list of outstanding tasks and feature requests forms what is known as a Backlog and tasks in the backlog are prioritised based on the value and effort required to complete them. Large tasks which will take many weeks are broken down into smaller tasks and should be addressed in sprints which are typically two week periods for completing work (this may be extended given most involved are not working solely on TopoStats and have other tasks and responsibilities to complete).

How to estimate effort depends on having a feel for how long it takes to do things which can be tricky, if project management of issues has been done well you can get an estimate from Burn-up charts (and Projects includes some tools for this but ~~they're not great~~ I've not sussed them out yet).

There are a few techniques for effort estimation though, here are some notes I have from my Agile course.

Planning Poker

Useful if backlog is small (<10). Everyone has cars with Fibonacci sequence on and a question mark. As each backlog item is discussed team members lay a card face down on the table for their estimation of effort, at the end reveal and discuss if they are far apart to form a consensus.

Dot Voting

Also useful for sprints with a low number of backlog items.
Each dot represents a relative estimate of the effort required and team members add their choice of dot against each item.

The Bucket System

Useful when there is a large backlog, even up to several hundred.
A cental line of numbers representing effort is constructed and each item is placed on a story card.
Each person draws a story card at random then places it somewhere along the numbered effort list. No need for discussion. If someone doesn't understand a story they replace it. Spend no more than 120 seconds on each item.
If you find an item where you don't think it fits the discussion is then had.

Large/Uncertain/Small

Useful when the product backlog has several similar or comparable items.
Similar to the Bucket System but only uses three categories to bin tasks.

Ordering Method

Ideal for small teams with large Product Backlog.
A scale is prepared and items placed randomly from low to high.
Each team member then takes a turn to move any item one spot lower or higher on the scale or passes their turn.
Repeat until team members all pass and no one wants to move anything.

Affinity Mapping

Useful for >20 items on the product backlog.
Place sticky notes on a wall/whiteboard/table with unique user stories or items on them.
Rearrange notes one at a time discussing if notes group together or form their own group.
Should aim to form between 3 and 10 groups which are then named (i.e. epics) and prioritised.

T-Shirt sizes

Agree on the chosen scale and metrics to be used (XS, S, M, L, XL, XXL, XXXL).
Identify one anchor backlog item and assign it a t-shirt size, sometimes two are chosen at the top and bottom of the scale.
Sort the remaining items relative to these and assigning them sizes.

Story Points

Use Fibonacci sequence, but often capped (e.g. 21) or skip some (21 to 100), its a team decision.
Identify at least one anchor backlog item and assign it a points value, sometimes two are chosen at the top and bottom.
Sort the backlog relative to these.

Characteristics of effective estimation

Avoids gathering false precision of estimates.
Avoids anchoring bias
Promotes inclusivity
Leads to effort discovery

Best Practices

Ask the Product Owner questions about the user story to ensure there is sufficient information to estimate.
Discuss divergent estimates from different team members.
Agreeing on the final scales and capturing it in the system.
If many items fall into the larger size effort estimates discuss whether it makes sense to break them down into smaller tasks.

How we add / maintain these

Somewhat contingent on the tool that is used but it should work with GitHub regardless, that is where we track our issues and pull requests.

If a clear Milestone is defined and agreed upon with timelines and Sprints to complete the work then typically new feature requests should go to the back of the backlog unless they directly impact on a given feature that is under development. This helps the Milestone be reached in the expected timeframe.

Suggestions from Neil

Modularisation

@SylviaWhittle has done an amazing job setting up this framework but I think we need to shift over to it completely so that each step in processing can be run independently. This requires a few additional things.

Saving of intermediary data, this will incur an overhead in batch processing as it involves I/O (Input/Output aka writing to disk) and will slow down the batch processing. On the flip side the heavy up-front processing of saying filtering won't be required when subsequently tinkering with grain detection settings.
Documentation will need re-writing to reflect the modularised steps.
Configuration files will need splitting. We should keep a default configuration file for batch processing but in the topostats config sub-command we should allow users to select a step that they want to run and produce sample YAML for just that stage. This will require each stage to accept the global config file and pull out the relevant section and the section specific configuration file.

Napari Plugin

Napari is "a fast, interactive, multi-dimensional image viewer for Python. It's designed for browsing, annotating, and analyzing large multi-dimensional images. It's built on top of Qt (for the GUI), vispy (for performant GPU-based rendering), and the scientific Python stack (numpy, scipy).".

I think making output from TopoStats viewable in Napari would be really useful.

Maintain Good Test coverage

It's important that we ensure the code that goes into the main branch does what we think it should. The way to do this is to ensure we have good test coverage. This is important because it a) ensures that the code does what we think it should; b) protects our future selves to capture breaking changes.

Unit tests should ideally be based on abstraction and not rely heavily on any one single sample.

When we have bug reports these should form the basis of new tests so that we ensure we've captured this.

Maintaining Documentation

As usage and features change we should ensure that documentation is kept up-to-date. This could be done by including a template of checklist items for Pull Requests that includes updating documentation as well as docstrings for functions/classes/methods and other minimums that we require when merging to main (e.g. passes all tests).

Notebooks are a nice medium for demonstrating how to use the tools interactively but do require a fair bit of maintenance and also require people to be familiar with running them locally.

We might want to consider using Web Assembly Framework to provide interactive Notebooks using JupyterLite/Pyodide. These can also serve as the basis for a WebUI for processing.

Improving Git Workflow

I think we can make our lives a bit easier if we try and improve our Git workflow. This ties in with having clear Milestones and breaking larger tasks/issues (termed Epics) down into smaller tasks that can be achieved in sprints.

git commit's should be atomic and capture a single task. These may themselves be the result of several commits but these can be squashed via ether git commit --amend or git rebase --interactive to combine multiple commits into a single commit once all the steps of work have been completed. It will make the history easier to read and makes Pull Requests easier to review. Useful resource on this is Git Legit and related material such as How to write a Git Commit Message.

How to Handle Support Requests

We're starting to see questions from users, both within the group and outside. My preference would be for all support requests to occur through GitHub Discussions. Slack history is ultimately lost on the free tier, whilst the Discussions are more permanent. They also serve to become a repository of searchable information that others can search first. Further over time as questions get repeated and answers/solutions accrue it can be used as a seeding ground for documentation.

Long Term View - Laboratory Information Management System (LIMS)

I really liked @alicepyne proposal to standardised meta-data about experiments conducted, this rung some bells in my head at the time about the possibility of developing a Laboratory information management system to organise the data that is generated by users and labs. A major advantage of this would make it much easier for people who manage large teams to find and use data across studies. TopoStats would I hope fit into this by automatically processing scans so that users got clean images automatically. This would likely require some tinkering to find useful default values for different sample types but would potentially be very useful.

Having a WebUI interface for users to log into and view the results, tweak processing parameters and have the results regenerated all the time with data being stored and managed in a back-end database would be very useful I think.

AFM-SPM / TopoStats

TopoStats RoadMap #636

What

Why?

How

Agenda