MI2DataLab / memr

R package for Multisource Embeddings for Medical Records
https://mi2datalab.github.io/memr/
Other
17 stars 3 forks source link

Add vignette/more documentation on data inputs #3

Closed cmaimone closed 2 years ago

cmaimone commented 4 years ago

For this package to be useful for other researchers and to serve a purpose beyond capturing the method and code used for https://arxiv.org/pdf/1907.04152.pdf, it needs a vignette and more extensive documentation.

After reading the JOSS paper, the readme here, and the documentation, I'm not clear on how a researcher or doctor would start to use this package.

The readme references "medical free-text records written by doctors" but the example data sets are highly distilled and contain just a few terms. Given the description both here and in the arxiv paper, I expected a sample dataset that approximates the structure of the "dataset of free-text clinical records" referenced. I then expected to see documentation and examples of how a user of the package would be expected to transform this raw data (or really their own similar data) into the distilled inputs expected by the functions of this package.

From https://arxiv.org/pdf/1907.04152.pdf, it seems that memr is not focused on this data processing. If this is correct, I'd suggest 1) editing the description of the package to reflect what type of data it can be used with, and 2) more documentation on what the structure of the data inputs to the functions are expected to contain and what the characteristics of the data should be (e.g. should terms be lowercase? certain parts of speech?). memr does not necessarily need to have all of the functionality to process medical free text records into the format the package needs (although that would be helpful), but potential users need to know what type of data inputs they need to create. The sample data sets and vectors are insufficient to determine this.

Re: https://github.com/openjournals/joss-reviews/issues/2482