ReScience / call-for-replication

Call for Replication in ReScience
MIT License
13 stars 0 forks source link

The Blessings of Multiple Causes #5

Open timothyb0912 opened 4 years ago

timothyb0912 commented 4 years ago

Work to Replicate

Wang, Yixin, and David M. Blei. "The Blessings of Multiple Causes." Journal of the American Statistical Association 114.528 (2019): 1574-1596. DOI: 10.1080/01621459.2019.1686987

Motivation

Addressing unobserved confounding in observational datasets is critically important when trying to make causal inferences. The Blessings of Multiple Causes promotes one such method to do so, the deconfounder.

In the experiments that my colleagues and I have performed, the technique has proven simultaneously harder to use than described and ultimately ineffective in simulations where we know the ground truth.

These observations persist when performing initial experiments with the authors' original data.

My colleagues (@bouzaghrane and @hassanobeid1994) and I believe that the original presentation of the Wang and Blei's work glosses over the issues that contribute to these hardships. Researchers attempting to make causal inferences with their own datasets may waste undue time trying to use this method, if they are not aware of these problems. Accordingly, we think a replication is a good idea as we can show, in the context of the original paper, new details that allow users to make informed choices about how to use this method and if the method is worth trying at all.

Beyond the points above, the authors' example code is in tensorflow, and it quite difficult to read / understand (in our opinion). Once we've replicated Wang and Blei's work in tensorflow, we would like to rewrite their code in pytorch / pyro. We expect this to be both easier to understand and edit for others who want to use / build upon their work.

Challenges

We expect the replication to take 2-3 months due to factors such as the ongoing pandemic and the fact that the collaborators on this project have other day-jobs.

Mild, but easily surmounted, expected difficulties include the fact that the authors example code is in tensorflow, and we have only basic familiarity with this framework. However, the authors have posted most of the code needed to replicate their paper. Additionally, Wang and Blei's article describes their algorithms clearly, and the data used in their study is available.

Lastly, the original code for the paper is not in the public domain, but we expect the authors to be reachable via email or via the github repo that provides tutorial / example code for the paper.

Questions

  1. Given that the original paper performs a simulation study, and we don't directly have access to the exact random seeds used in the article, are the editors of ReScience C okay with a replication that leads to the same qualitative conclusions even if the results are not identical?
  2. Beyond the pure replication of the article's results, are the editors of ReScience C okay with our providing further analysis and commentary of the original data and methods? All such analysis and commentary will be reproducible and open source via github with data and code.
rougier commented 4 years ago

Hi @timothyb0912,

Sorry for the late answer and to answer your question:

  1. It's the case of most of the replication we have published so far because generally authors do not save the random seed. So it's perfectly acceptable (it's the norm actually) to do a replication and to provide new results that cannot be bit for bit the same as the orignal. However, you'll need to explain why they are similar or equivalent (that's generally domain dependent).

  2. Once you've replicated the original results, you can of course provide further analysis in the same paper. If you think this further analysis is an original work, you migth also consider writing a new article only on this part and submit it to another journal and pusblish only the replication in ReScience.

I'm cc'ing other editors in chief if they want add precision: @khinsen @benoit-girard @oliviaguest

oliviaguest commented 4 years ago

Thank you for tagging me, Nicolas! So my two cents to add to what Nicolas said above:

  1. Given that the original paper performs a simulation study, and we don't directly have access to the exact random seeds used in the article, are the editors of ReScience C okay with a replication that leads to the same qualitative conclusions even if the results are not identical?

You can also (try to) contact the original authors and obtain their random seeds, if useful. Either way, of course, is totally fine.

  1. Beyond the pure replication of the article's results, are the editors of ReScience C okay with our providing further analysis and commentary of the original data and methods? All such analysis and commentary will be reproducible and open source via github with data and code.

I would suggest that it is not salami-slicing to separate these out, as Nicolas also said above of course. I would also say that if it were me, I would separate these out as they are different contributions to science.

Hope these answers help and please ask follow up questions to fully clarify things, if needed! ☺️

timothyb0912 commented 4 years ago

@rougier and @oliviaguest,

Awesome, and thanks for the quick replies! We'll get started on the replication and then follow up with an issue in your submissions repo afterwards.

Thanks for doing all the hard work of maintaining this journal!

oliviaguest commented 4 years ago

@timothyb0912 excited to see your work and thank you too! 😊