carpentries-incubator / proposals

Open an issue in this repository to share Carpentries-style lessons and lesson ideas.
88 stars 6 forks source link

Modern data analysis for veterinary researchers #27

Open RobHarrand opened 5 years ago

RobHarrand commented 5 years ago

Basic idea An introduction to modern data analysis concepts for those working in veterinary research.

Detail I've worked with researchers in veterinary science for a number of years now, and my experience is that everyone is still performing data analysis in the usual way, i.e. loading data into Excel, hacking the data around for several weeks, producing plots and tables, and then copying-and-pasting them into reports and slides.

I have some basic training material introducing R, aimed at such researchers, which I created recently for the company in which I work. I could (as long as I can get permission from the company) add to this material and adapt it to the data carpentry format.

Despite being labelled as 'veterinary', the material is currently very generic, so I think the next step would be to add additional sections beyond the simple aspects of R such as loading, tidying and plotting data. I'm thinking along the lines of things like working out the sensitivity and specificity of a diagnostic test, and evaluating the results with a ROC plot (quite common in certain parts of veterinary healthcare). All of that said, the material would be applicable to human-health researchers, too, as the concepts are the same, so perhaps this should be more 'health', or 'healthcare' rather than 'veterinary'?

Any thoughts or ideas appreciated. Thanks.

Jongmassey commented 5 years ago

I am also keen to contribute to this curriculum, I think there are some domain-specific issues around data cleaning and management that are worthy of consideration in a veterinary-specific course. Perhaps drawing on, or contributing to the efforts of @kerchner and HDRUK in #21

I share your assessment that many researchers, and even clinicians in practice, are undertaking more and more data-intensive research but often lack the skills and tools to do so most effectively.

RobHarrand commented 5 years ago

Hi @Jongmassey

Great!

What veterinary specific items do you think would be useful? As I mentioned above, something around diagnostic testing (sensitivity, specificity, ROC plots, etc) is something I see in many papers. Hypothesis testing, p-values and confidence intervals is another big area. That said, perhaps these are beyond the scope of a data carpentry lesson, as they're arguably more about the statistics than the data (Statistics Carpentry?!). What data-specific issues have you encountered in veterinary? Something around cleaning and managing clinical notes?

ErinBecker commented 5 years ago

Hi @RobHarrand - thanks for getting this conversation started! I'm just checking in to see if you would like me to create a repository for you to work in in the Incubator. If you'd rather hold off on that and do some preliminary discussions here, that's fine! But if you would like a more formalized place to put these ideas and start drafting material, please answer the questions in the Issue template. I'll keep an eye on this thread and follow up with you once you're ready to move forward on starting the lesson materials. No rush!

RobHarrand commented 5 years ago

Thanks @ErinBecker - I'm reaching out to some vets that I know to get their opinions on content. Once I've done that I'll answer the requestions in the Issue template.

Jongmassey commented 5 years ago

Hi @RobHarrand - my thinking was more around general data wrangling rather than statistical methods: data storage best practices, databases, cleaning, linked data, open data sources, processing data in a reproducible way and sharing code with others, data protection law implications etc.

I come from a livestock data background rather than companion animals so my viewpoint is probably skewed by that!

RobHarrand commented 4 years ago

Hi @Jongmassey , OK that makes sense. I've had a chat with @ErinBecker as she suggested maybe a 'statistics for data science' lesson. I'll start that off in the lesson ideas area. I think there are aspects of statistics that data scientists would find useful that are well-suited to veterinary researchers. For example, I mentioned sens and spec, with ROC curves. Those are probably equally useful for interpreting diagnostic tests as they are for evaluating binary classification machine-learning models. I'll have a ponder. I'll also try and carve some time out to flesh out this veterinary lesson idea. Erin has pointed out this lesson, which may overlap, too. Cheers.