dalejn / cleanBib

Probabilistically assign gender and race proportions of first/last authors pairs in bibliography entries
MIT License
149 stars 31 forks source link

Simplifying/streamlining #7

Open dalejn opened 4 years ago

dalejn commented 4 years ago

To help streamline and simplify the Binder, note that the code uses both Python and R. The most straightfoward way to modify the code is to do so within the Jupyter Notebook launched through Binder (see instructions https://github.com/dalejn/cleanBib#instructions). Then, after modifying, you can save/export the Jupyter Notebook with your changes by saving it as an .ipynb file on your computer. Please test the changes in your own Github branch and/or attach your code to a reply here with a general description of the problem being addressed and the change to the code.

j6k4m8 commented 3 years ago

I have adapted the code to a fully Python-based implementation here, which runs quite fast (~1 minute on my laptop with my janky internet connection).

Main differences:

The Good:

The Bad:

The I-Don't-Know-if-It's-Good-Or-Bad:


Just dropping this here in case it's helpful to you or if you are able to repurpose any of the code. I definitely want to respect your emphasis on reproducibility.

(This is also explicit permission to use that code if you want any of it)

dalejn commented 3 years ago

Thanks for working on this and for the write-up. It looks great! I particularly appreciate your emphasis on simplicity and usability, and caching results and trying a different parser are great ideas. I've been planning to rewrite the cleanBib implementation with functions and to clean up some bloat, so thank you also for this material. Healing broken references and automatically dealing with flagged self-citations without burdening the user too much is still very much a work in progress (I think we struck a decent trade-off prior to adding in the race code, and I'll be trying to get us back there). If you end up going in this direction and in a way that doesn't feel at odds with usability, I'd love to follow up!

j6k4m8 commented 3 years ago

Amazing! If you're interested, I can spend a little more time on getting the ref-healing code to work here. Or I can stop bothering you with github notifications 🤣

j6k4m8 commented 3 years ago

Some improvements:

https://gist.github.com/j6k4m8/3b86b0a78c7966e9257be2677feff781

Would love the opportunity to run this alongside some known results using the current implementation to see how the numbers compare; I'd run it myself but I'm quickly running out of API credits!