MrPowers / quinn

pyspark methods to enhance developer productivity 📣 👯 🎉
https://mrpowers.github.io/quinn/
Apache License 2.0
597 stars 93 forks source link

Update search files #215

Closed jeffbrennan closed 4 months ago

jeffbrennan commented 4 months ago

Proposed changes

Adds feature described in #214

Types of changes

What types of changes does your code introduce to quinn? Put an x in the boxes that apply

Further comments

Added dictionary result of {path: {keyword: count}}. Can change schema if needed.

During testing, I noticed the counts of the keywords weren't accurate because of the break command after the first keyword was found per line. For example, this test returned an "rdd" count of 4 because of the multiple matches on line 8 image

I assumed this was to prevent printing the same line more than once, so I removed break and added a check to only print the line the first time with line_printed starting at False and updating to True after a keyword match.