UKDataServiceOpen / Data-FAQs

The Frequently Asked Questions of data, stats, careers and more.
MIT License
3 stars 1 forks source link

R or Python #2

Open joseph-allen opened 3 years ago

joseph-allen commented 3 years ago

What is your FAQ?

For general Data Science, should I learn R or Python, or something else?

Who could be qualified to answer this?

Anybody in industry or academia with career experience.

Other Comments

Usually we get asked for R or Python, is it worth swapping from one to the other? Knowing both? Is there a 3rd competitor upcoming , I have heard good things about Julia?

Any answer below with a high number of thumbs up should be merged as an answer.

JKasmire commented 3 years ago

I find R to be great for doing statistical analysis on large data sets. There are R packages for all kinds of tasks, so there is a very good chance that you will find something already made that lets you do exactly what you want to do. There is also lots of support/how-to documentation online, although some of it could be improved with more example cases. R is free and easy to install and R scripts are amenable to comments, all of which makes R good for collaborative work, clear documentation and reproducibility. R is particularly strong for the ease of making good visualisations, including online and interactive graphs. For example, I used R to help me plot 6000 events recorded during an agent-based model as well as to add broken regression lines to that model and calculate the difference in slopes between each subsequent regression line segment.

I find python to be a much wider tool, so is good for accessing, storing, and analysing data in a wide variety of formats. As python does so much more than just analysis, you may or may not find something already made that does exactly what you want. There is a lot of good support/how-to documentation online, although you may struggle to find good documentation if you are doing something very unusual or unprecedented. python code is free but may not always be easy to install. python code is amenable to comments. Within established python collaborators, python code is very good for collaborative work, documentation and reproducibility, but collaborators with limited python experience may need extra support to contribute to collaboration. For example, I used python to download thousands of .pgn files (recorded chess games), split each game into its own file, transform the notation within that file into a different format, feed the file with revised notation into an agent-based model and then record the output of that model into a new file.

lbrierley commented 3 years ago

Julia's answer above about the nuances of each language is excellent! I would add that isn't a decision to agonise over and it won't make or break your career; specialising in one of these doesn't lock you out of learning the other! Many academic and industrial data scientists move between them, knowing enough in each to do what they need to do at various ratios (I would say I am 90:10 R:Python, myself).