MaxHalford / maxhalford.github.io

:house_with_garden: Personal website
https://maxhalford.github.io
MIT License
12 stars 5 forks source link

blog/pandas-tricks/ #9

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

A few intermediate pandas tricks - Max Halford

I want to use this post to share some pandas snippets that I find useful. I use them from time to time, in particular when I’m doing time series competitions on platforms such as Kaggle. Like any data scientist, I perform similar data processing steps on different datasets. Usually, I put repetitive patterns in xam, which is my personal data science toolbox. However, I think that the following snippets are too small and too specific for being added into a library.

https://maxhalford.github.io/blog/pandas-tricks/

xutianyu540 commented 3 years ago

why we need to use shift(1)

MaxHalford commented 3 years ago

@xutianyu540: our goal is to perform target encoding on a time series. When you do that, you want to calculate the average over the past values. You need to shift(1) in order to skip the current value. The current value will be included in your average if you don't shift the series, which is nothing more than target leakage.

xutianyu540 commented 3 years ago

@MaxHalford: thanks!