Separate dataChest logs by week or check the size of the log files

nmGit commented 8 years ago

Also allow for any sample rate.

amopremcak commented 8 years ago

Why separate the size of the logs by week @patzinak? The sizes of these files should be rather small ~ few million 64 bit floats.

patzinak commented 8 years ago

Because we saw every single variable data log file getting to ~12 Mbytes in size after barely running the GUI for two (accumulated) days. This is conditional on my understanding of the things.

On Jul 22, 2016 19:46, "amopremcak" notifications@github.com wrote:

Why separate the size of the logs by week @patzinak https://github.com/patzinak? The sizes of these files should be rather small ~ few million 64 bit floats.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/McDermott-Group/servers/issues/55#issuecomment-234688545, or mute the thread https://github.com/notifications/unsubscribe-auth/AMaYkBnDhOc0JTBvvNdCY9s080Ax83hUks5qYWRjgaJpZM4JSWEF .

amopremcak commented 8 years ago

How many variables are in the files Ivan is talking about @nmGit? I should verify that this is true as that seems rather large unless the files have say, 7-8 variables total, all of type float64 (assuming a sampling rate of 1 Sample/sec, (60_60_24_2)_8*8/1e6 ~ 11 MB). Regardless, breaking them up still doesn't save space as the number of bytes is going to be distributed across more files. I'd like to know what a typical sampling rate is for this GUI so I can test this to verify.

amopremcak commented 8 years ago

For a single variable log file I ran script to mimic this

from dataChest import *
import numpy as np
import time
from datetime import datetime
d = dataChest(["SomeFolder", "Some Sub Folder"])

utcdatestamp = datetime.utcnow().isoformat()

#1D Arbitrary Data (Option 1 Style)
#Random Number Generator
print "1D Arbitrary Data (Option 1 Style):" #Most inneficient style by far
numPoints = 60*60*24*2
print "\tNumber of Points =", numPoints
mu, sigma = 1, 0.1
#gaussian = mu + sigma*np.random.randn(histLen)
d.createDataset("Histogram1by1", [("indepName1", [1], "float64", "Seconds")], [("depName1", [1], "float64", "Volts")])
d.addParameter("X Label", "Time")
d.addParameter("Y Label", "Digitizer Noise")
d.addParameter("Plot Title", "Random Number Generator")
net = []
for ii in range(0, int(numPoints)):
    net.append([float(ii)*1e-4,np.random.rand()*0.5])
t0 = time.time()
d.addData(net)
tf = time.time()
print "\tTotal Write Time =", tf-t0
t0 = time.time()
d.getData() #all rows
tf = time.time()
print "\tTotal Read Time =", tf-t0

I get a file size of 2.9 MB. With no overhead, the file should be 2.7648 MB.

nmGit commented 8 years ago

hmm, that's odd. The dataset we were looking at is 4 variables and in two days it grew to 13MB so I'm not sure what's going on. I'm 95% sure i'm not storing redundant data.

EDIT: DatachestWrapper.py is what I'm using to store data if you're curious

patzinak commented 8 years ago

I restarted GUI and saw that new data sets had been created in folder .../2016/7/6. This seems rather strange because today is not the first day of a new week and it is neither the 6th week of the month. @nmGit

McDermott-Group / servers

Separate dataChest logs by week or check the size of the log files #55