calpoly-csai / swanton

Swanton Pacific Ranch chatbot with a knowledge graph
MIT License
3 stars 1 forks source link

Log input to local CSV #8

Closed masonmcelvain closed 3 years ago

masonmcelvain commented 3 years ago

Summary

Wrote a function log_to_csv() in log_input.py that stores a dictionary to a CSV

Details

The function will create a new temporary CSV (called temp_log.csv) if it doesn't exist, and create a header row based on the keys of the dictionary (in the same order) as well as append the values in a new row. If the file already exists, only the value are appended to the CSV.

Testing

I tested this function by making a dummy dictionary, and visually checking that the output csv looked right. Sample code to do that below (paste into log_input.py):

def main() -> None: # optional test driver
    import datetime
    input = {
        "question": "the question",
        "answer": "the answer predicted",
        "time": datetime.datetime.now(),
        "score": 42.9
    }
    log_to_csv(input)

if __name__=="__main__":
    main()

Expected output to temp_log.csv if it does not yet exist:

question,answer,time,score
the question,the answer predicted,2020-08-25 15:58:51.965562,42.9

If the file already exists, it will append just the contents of the dictionary, and not create headers. For example, a second run of this function would produce in temp_log.csv:

question,answer,time,score
the question,the answer predicted,2020-08-25 15:58:51.965562,42.9
the question,the answer predicted,2020-08-25 15:58:51.965562,42.9
masonmcelvain commented 3 years ago

Not sure if this is what you had in mind @chidiewenike, let me know 😁

chidiewenike commented 3 years ago

Just need one more reviewer for this PR.

masonmcelvain commented 3 years ago

Thanks for the feedback Jason! Log file size is definitely worth considering. We can definitely write to multiple csv files if we think it's worth doing.

After perusing around the internet, it seems that the csv module simply iterates over the lines of the file with a pointer, and does not load the entire file into memory. While the docs don't explicitly say this for csv.writer, they do say this for csv.reader, so I'm making the assumption that they are implemented similarly (but i could totally be wrong). From the docs:

csv.reader() : Return a reader object which will iterate over lines in the given csvfile. csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called

You could also check out the C-code used to implement the csv module if you are really curious.

To see how much memory temp_log.csv might use, I generated a csv of 1000 entries (1 day's logs, perhaps) and 7000 entries (a week's worth). I measured the size with os.path.getsize. A 1000-row csv occupies 67 KB on my machine, and the 7000-row occupies 469 KB. Not sure if the size would be different on a raspberry pi, I don't know enough about them.

Jason-Ku commented 3 years ago

Thanks for taking the effort to dig deeper into this and running that trial - based on your experiment and what the docs say about csv.reader, I think it's safe to assume that the file size affects the memory usage when using csv.writer. However, I think it won't be too much of a problem, as based on your size estimates we're looking at around roughly 15 thousand lines of logs for a 1 megabyte file...

masonmcelvain commented 3 years ago

Agreed :) At that pace, if memory becomes an issue, our chatbots must be a wild success at Swanton 😂