JJ / science-data-science

Science should be agile. Data science too.
MIT License
0 stars 2 forks source link

Open to collaborations? #6

Open jmrr opened 3 years ago

jmrr commented 3 years ago

Hi @JJ, congrats for putting together this excellent manifesto. This really resonated with my own story and how I see the future of science. I'm wondering if you'd be open for contributions. Let me give you a bit of my background first and how I also lived how Agile could help science.

I started my PhD in 2012 (it was indeed in the ML area) and around that time my research group was using SVN (this was considered advanced even in Engineering faculties) and collaborating in papers via keeping the folders with the LaTeX source in Dropbox. A few of us that were familiar with the growth of the Agile mindset started suggesting git, using kanban boards and by the time I finished my PhD in 2015 we were using Docker to ship the code to run our experiments and even thinking about publishing not just the code and the datasets, but the docker images themselves. We never did as the baby steps in the community, as you mentioned with NeurIPS (NIPS back then) were to just provide a link for the repo and to open source your dataset if you had a new one.

Given an achieved reproducibility of some of the conditions of the experiment (code with the method and environment), what about the data? This was a massive headache for us as sometimes the results are generated using specific sets for train, validation and test datasets, and even worse, sometimes data evolves with time (e.g. weather, astronomical observations, etc.). At that point I finished my PhD and because many of my experiments were in computer vision, by just providing a static version of the train+validation+test sets it was enough for data reproducibility.

But then I jumped to industry and the explosion of tooling and the benefits of the Agile mindset in data science teams was evident. Some of the practices (like sprints) were still hard to adopt as there's still a research aspect which is almost impossible to timebox. But extensive testing, automation, frequent standups to untangle blockers, letting the MVPs to drive the pace, etc. these are all practices that the academic science was lacking that could really have a massive impact in it.

Finally, I think adopting a more open and agile mindsets in academia are crucial to bring other sources of funding for doing fundamental research. By being paper-centred (worse if it's in a closed-door journal with 1.5 years of decision turnaround) aspiring and junior researchers will see that the real innovation is happening outside universities and research centres and happening, in an iterative, fail-fast way in industry, the way this has happened traditionally in Medicine.

Apologies for the long post but wanted to give you a bit of background and ask if I could collaborate with a few humble comments and typos I've noticed and I really hope this can get some traction via arXiv or some other tool.

JJ commented 3 years ago

Please change or add all you want. And thanks!