anonamie / local-log-reader

0 stars 0 forks source link

Feedback from Cribl #1

Open Lextal opened 1 week ago

Lextal commented 1 week ago

Hi Alison,

Thank you for submitting your solution! You’re on the right track, and I really appreciate the effort you’ve put into it so far. Here are a couple of tips that might help you refine your solution:

  1. You can assume that all log files have events sorted by time, with the newest events at the end.
  2. Think about cases where you need to process extremely large files that don’t fit into memory. However, you can assume the final output will fit in memory.

Looking forward to seeing your next steps — cheering you on!

anonamie commented 1 week ago

Hi Roman -

Gotcha - that assumption is helpful. I was informed that we can read a sample log file within the project so I started out with a script that generates a sample log file randomly (unordered). Due to this my LogService has the additional logic to sort by timestamp, but this can be removed since we can always assume the files are sorted.

I have a few follow-up questions:

Thanks for your support, and looking forward to learning more from your response!

Lextal commented 1 week ago

That's a great point, sorting with that assumption becomes unnecessary. It does, however, open up a different question, how to avoid processing data that you wouldn't need (i.e. if you know that reading lines in the beginning of the file is pointless since you are going to drop them later anyway).

Can we also assume that each log event will always be separated by a line break?

Yes

It seems that the main feedback is on the input file scanning - so I can modify the file service to accommodate this better. Other than this, would you be able to share any other gaps in the current solution keeping it from meeting mvp?

No, everything looks good, this just was the only thing that we wanted to double click on.

anonamie commented 1 week ago

If you pull the latest, the endpoint now includes pagination so that we can request parts of the file in chunks. (script pending)

Current sample project sets a max page size/batch number to 100 events for POC/our readability's sake. The current sample event sizes are about ~.4kb so for a larger file with events of similar size, requesting up to 1-2k events per "page" could make sense assuming we want to keep our batch sizes capped to 1MB. But the optimization of batch sizes and etc would most likely depend on the rest of the system and performance/load testing.

Thanks and let me know if you have any other comments or feedback!