Closed masonmcelvain closed 3 years ago
Not sure if this is what you had in mind @chidiewenike, let me know 😁
Just need one more reviewer for this PR.
Thanks for the feedback Jason! Log file size is definitely worth considering. We can definitely write to multiple csv files if we think it's worth doing.
After perusing around the internet, it seems that the csv
module simply iterates over the lines of the file with a pointer, and does not load the entire file into memory. While the docs don't explicitly say this for csv.writer
, they do say this for csv.reader
, so I'm making the assumption that they are implemented similarly (but i could totally be wrong). From the docs:
csv.reader() : Return a reader object which will iterate over lines in the given csvfile. csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called
You could also check out the C-code used to implement the csv module if you are really curious.
To see how much memory temp_log.csv
might use, I generated a csv of 1000 entries (1 day's logs, perhaps) and 7000 entries (a week's worth). I measured the size with os.path.getsize. A 1000-row csv occupies 67 KB on my machine, and the 7000-row occupies 469 KB.
Not sure if the size would be different on a raspberry pi, I don't know enough about them.
Thanks for taking the effort to dig deeper into this and running that trial - based on your experiment and what the docs say about csv.reader
, I think it's safe to assume that the file size affects the memory usage when using csv.writer
. However, I think it won't be too much of a problem, as based on your size estimates we're looking at around roughly 15 thousand lines of logs for a 1 megabyte file...
Agreed :) At that pace, if memory becomes an issue, our chatbots must be a wild success at Swanton 😂
Summary
Wrote a function
log_to_csv()
inlog_input.py
that stores a dictionary to a CSVDetails
The function will create a new temporary CSV (called
temp_log.csv
) if it doesn't exist, and create a header row based on the keys of the dictionary (in the same order) as well as append the values in a new row. If the file already exists, only the value are appended to the CSV.Testing
I tested this function by making a dummy dictionary, and visually checking that the output csv looked right. Sample code to do that below (paste into
log_input.py
):Expected output to
temp_log.csv
if it does not yet exist:If the file already exists, it will append just the contents of the dictionary, and not create headers. For example, a second run of this function would produce in
temp_log.csv
: