donnemartin / data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Other
27.1k stars 7.83k forks source link

Data preprocessing #95

Open amira-yahlali opened 1 year ago

amira-yahlali commented 1 year ago

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

algopy commented 1 year ago

Ok, what's your objective ?

On Thu, Mar 2, 2023, 15:58 amira-yahlali @.***> wrote:

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

— Reply to this email directly, view it on GitHub https://github.com/donnemartin/data-science-ipython-notebooks/issues/95, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMQQRCQQ3UD2DHFC2J2VBX3W2BY6VANCNFSM6AAAAAAVNF5SK4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

amira-yahlali commented 1 year ago

Ok, what's your objective ?

On Thu, Mar 2, 2023, 15:58 amira-yahlali @.***> wrote:

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

— Reply to this email directly, view it on GitHub https://github.com/donnemartin/data-science-ipython-notebooks/issues/95, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMQQRCQQ3UD2DHFC2J2VBX3W2BY6VANCNFSM6AAAAAAVNF5SK4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

I just need an understanding of what the columns represent and if the null value in each columns is a normal value or is it a missing value i'm trying to preprocess my data and like minimize it

algopy commented 1 year ago

columns represent and if the null value in each columns is a normal value

or is it a missing value need to see your data to identify theses points ?

i'm trying to preprocess

On Thu, Mar 2, 2023 at 6:07 PM amira-yahlali @.***> wrote:

Ok, what's your objective ?

On Thu, Mar 2, 2023, 15:58 amira-yahlali @.***> wrote:

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

— Reply to this email directly, view it on GitHub

95

https://github.com/donnemartin/data-science-ipython-notebooks/issues/95, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AMQQRCQQ3UD2DHFC2J2VBX3W2BY6VANCNFSM6AAAAAAVNF5SK4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

I just need an understanding of what the columns represent and if the null value in each columns is a normal value or is it a missing value i'm trying to preprocess my data and like minimize it

— Reply to this email directly, view it on GitHub https://github.com/donnemartin/data-science-ipython-notebooks/issues/95#issuecomment-1451801742, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMQQRCWVTJXIJEV54HXN4TTW2CH6ZANCNFSM6AAAAAAVNF5SK4 . You are receiving this because you commented.Message ID: @.*** com>

amira-yahlali commented 1 year ago

My data is the cic-ids-collection on kaggle using class label as target dropping label and the rest is features i'd love to send you my notebook directly to make it easier for you

AnmolArora15 commented 7 months ago

Hi, Is this issue still open? I am looking forward to working on it. Thanks, Anmol Arora

HeerakKashyap commented 4 weeks ago

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

see brother, if u want to remove the columns having all the null values/missing values you can use : data.drop(colums=[' ',' ' ] , inplace=true) in order to remove those columns

if u want to check the columns with number of non null values you can use data.info() to have precise understanding for the data .

if order to check the outliers in the data you can use seaborn library and import pairplot fucntion i.e seaborn.pairplot in oder to have graph depicting the outliers .

Regards

amira-yahlali commented 3 weeks ago

Hello Thanks for reaching out the problem has been fixed thank you for consideration. Best regards

On Mon, Aug 12, 2024, 06:21 Heerak kashyap @.***> wrote:

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

see brother, if u want to remove the columns having all the null values/missing values you can use : data.drop(colums=[' ',' ' ] , inplace=true) in order to remove those columns

if u want to check the columns with number of non null values you can use data.info() to have precise understanding for the data .

if order to check the outliers in the data you can use seaborn library and import pairplot fucntion i.e seaborn.pairplot in oder to have graph depicting the outliers .

Regards

— Reply to this email directly, view it on GitHub https://github.com/donnemartin/data-science-ipython-notebooks/issues/95#issuecomment-2283131780, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3DMJZS6NMVN2OGJMBENM2DZRBA5HAVCNFSM6AAAAAAVNF5SK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBTGEZTCNZYGA . You are receiving this because you authored the thread.Message ID: @.*** com>