abjer / sds

Social Data Science - a summer school course
https://abjer.github.io/sds
18 stars 34 forks source link

What is the size of int and float? #4

Closed KarlTjensvoll closed 6 years ago

KarlTjensvoll commented 6 years ago

I understand that default for an int is int64, but you can specify int8 which is less precise, but would take less memory and storage space I would imagine.

I was unable to find a list over the exact storage values for the different types, does someone have an url or a list?

abjer commented 6 years ago

There is a list of numpy datatypes here that provides a good overview.

KarlTjensvoll commented 6 years ago

@abjer Thank you, I had a look at that site earlier, but I could not really figure it out. Reading it again I see that it does say "Those with numbers in their name indicate the bitsize of the type", so int8 requires 8 bits and int16 require 16 bits?

kristianolesenlarsen commented 6 years ago

@KarlTjensvoll in theory, yes but really no. The reason is python/numpy has to add some overhead when storing stuff to keep track of what it's storing. (This is exactly why low-memory applications are written in C or fortran and not python).

Numpy data has a .nbytes method which gives you the number you expect so int8().nbytes is 1 and int64().nbytes is 8. The benefit of numpy is that this overhead doesn't explode with the size of your array, so using the getsizeof method which gives a better image of the actual memory usage we can see

from sys import getsizeof
from numpy import int8, array

# No big difference an integer
getsizeof(int8())  # 25 bytes (numpy)
getsizeof(int())    # 24 bytes (base)

getsizeof(array(range(10**6)))   # 8000096 bytes (numpy)
getsizeof(list(range(10**6)))       # 9000112 bytes (base)

Im not really an expert in this but perhaps this gives more info: https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html