TommyJones / tidylda

Implements an algorithim for Latent Dirichlet Allocation using style conventions from the [tidyverse](https://style.tidyverse.org/) and [tidymodels](https://tidymodels.github.io/model-implementation-principles/index.html).
Other
41 stars 3 forks source link

JOSS Review: Paper review comments #75

Closed hassaniazi closed 2 months ago

hassaniazi commented 3 months ago

Review for Latent Dirichlet Allocation Using ‘tidyverse’ Conventions

This paper describes the implementation of LDA using tidyverse ecosystem to improve user-friendliness and give more control over complex data wrangling. Doing so has enabled new capabilities e.g., non-uniform initializations or setting flexible priors for LDA. Thanks for providing sample data and guidance for users on how to run to explore the model's functionality. I was able to execute tidy functions (print, glance, augment etc) and reproduce the results and plots from README.md, good job!

General comments:

I have two overall comments and then some specific comments:

  1. Analytical contributions It is unclear from the paper if any analytical adjustments to the LDA formulation have been made or if the "novel features" are mostly in the code base to improve the workflow. Initialization from prior Dirichlet or tLDA seem to qualify as analytical modifications that warrant to be highlighted further (e.g., explained in text and supported by an equation) for clarity and visibility.

  2. Paper writing More work needs to go into giving it the paper a more cogent and incisive shape. Part of the summary and some other bits read as a book/dissertation chapter. Transforming the background theory and focusing more on the contributions of this work could be one way to respond. For instance, taking the second para of the summary as an example, I’d recommend adapting the summary to highlight your contributions rather than the summary of tidyverse, which is well established and isn’t the main contribution of this work. Reference to "described in Chapter 5" in line 71 also needs to be removed. To be clear, I am not referring to the very useful Latent Dirichlet Allocation and Notation and Transfer LDA (tLDA) sections and not recommending a full rewrite-up of the paper, just minor but intentional adjustments would suffice.

Specific comments:

Review repo: https://github.com/openjournals/joss-reviews/issues/6800

TommyJones commented 3 months ago

Hello. Thank you again for the review. I'll structure my response/actions based on your comments above. I've included a checklist for me at the bottom to take care of these issues. But I do have a need for more guidance as it relates to 1 & 2, below.

General Comments

  1. I'm confused. Lines 49 - 60 of the paper cover the mathematical modification of LDA. Lines 61 - 115 cover the computational aspects. I also cited the dissertation upon which the package is based (Jones, 2023). So can I ask for specific recommendations on what needs clarification here?
  2. I will remove any accidental references to my dissertation (e.g. "chapter 5"). I'm not sure what you mean by giving the paper "a more cogent and incisive shape." I'm again confused as it seems like I have conflicting guidance here. You say "Transforming the background theory and focusing more on the contributions of this work could be one way to respond" but then also say "I am not referring to the very useful Latent Dirichlet Allocation and Notation and Transfer LDA (tLDA) sections." But implementing tLDA is one of the fundamental intellectual contributions here. (I am working on getting a stand alone tLDA paper published in a more appropriate journal as it's outside of the scope of JOSS.) Forgive me; I am not trying to be difficult. I genuinely don't know what I'm being asked to do.

Specific Comments

Checklist

TommyJones commented 3 months ago

Side note: I know Julia and Dave. I feel pretty bad about relegating Dave to an "et al." 😅 Sorry, Dave.

TommyJones commented 3 months ago

Issues except for clarifying contribution are taken care of over several commits ending with d04f62177528b165910b01fe4ad8535b92857e7c

hassaniazi commented 2 months ago

Thanks for resolving most of my comments.

  1. I wasn't sure if the modifications are novel to this work or have been presented before. In my understanding, these are novel to this. I believe including a phrase like the following should be sufficient: "In this work, we introduce transfer LDA (tLDA), which modifies LDA in the following way:"

  2. This comment of mine has also been addressed with your changes through out the manuscript and by creating a new section on topic modeling in R. I was mainly looking for a clear distinction between background and your contributions, which the new paper separates well.

Congratulations on this work.

TommyJones commented 2 months ago

Awesome. Thank you for taking the time and effort to go through the paper and code. Onward!