brittanyblouin / ANCRTAdjust

An R package to adjust routine HIV testing data from antenatal care to reduce bias in estimating HIV prevalence trends
MIT License
2 stars 3 forks source link

JOSS paper #14

Open seabbs opened 4 years ago

seabbs commented 4 years ago

Overall this is a good paper but I think it needs a few tweaks to make the messaging clearer.

Statement of need

Data cleaning is very important and missing data can be a major issue with surveillance data. I think these points come across well in your paper

I am a little concerned as to the utility of this package given it can only work with a single data format. How widespread is this data format? Is this a toolbox that many users can easily pick up and use? Why not make a more general data cleaning package that could be used with this dataset as an example?

What I am getting at here is what is the special features of this dataset that means that it needs its own data cleaning package rather than off the shelf tooling?

You discuss the fact that having clear and simple data handling guidelines is a good idea in the paper but the package doesn't really document this process at the moment. Instead it looks to me like a series of tools that someone could use to make such a guide.

I think it would make the paper and documentation stronger if you dealt with these points.

State of the field

Agree with @ellessenne here. Really need to see some comparison to other tools.

This needs to be in two forms:

1.) What tools exist already for high level data cleaning and why is this standard tool kit required for this data.

2.) Specific package functionality - why is this your imputation of missing data for example better than other commonly used methods (i.e MICE).

This needs to be in the package documentation as well.

Language issues

There are few language issues in the paper that make understanding it difficult - language could use some tightening.

Example

"If ANC-RT data on HIV serostatus are to be used for HIV surveillance, several challenges need to be addressed, however (Diaz, De Cock, Brown, Ghys, & Boerma, 2005). First, inferences from routinely collected program data could be biased due to imperfect data completeness, with some health facilities not reporting HIV testing data (WHO, 2013). "

For this #openjournals/joss-reviews/issues/1740 review

m-maheu-giroux commented 4 years ago

You raise valid concerns. Here is how I have addressed them.

Statement of need

We have built our functions around the most commonly encountered data format. We have requested and trialed our packages on several ANC-RT datasets from sub-Saharan African countries. This format for the dataset is very common as the great majority of countries on this continent are using this information to assess their HIV epidemics trends, updating each year the UNAIDS-supported Spectrum/EPP software used for monitoring HIV epidemics.

This is also the very reason why cleaning and adjusting this type of data requires is own software: we need to specifically address the idiosyncrasies of the ANC-RT data and their potential biases. We have attempted to better reflect the importance of this tool in both the JOSS paper and the package information.

State of the field

We have modified the paper to better reflect the novelty of this tool.

Language issues

We simplified this section of the paper to improve understanding.

seabbs commented 4 years ago

Thanks - paper is much improved.

I still have a few issues with it however.

I would really like to see some discussion (and I may have missed this) of why standardised cleaning is needed over ad-hoc analyst by analyst cleaning (this may seem obvious but can be helpful for potential readers/users). For example, I may like to impute my missing data in a different way to the approach you have adopted (MICE), why should the users trust yours?

If it were me I would focus on the fact that you have standardised data cleaning in an open manner. Your approach may not, ultimately, be the optimal one but having this open source toolkit will allow a community of best practise to form. They (and you) can then easily make changes that are well documented and tested.