allenai / wimbd

What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
Apache License 2.0
172 stars 18 forks source link

Add 'search' command for counting occurences of regex patterns #7

Closed epwalsh closed 6 months ago

epwalsh commented 6 months ago

Closes #6.

Ex:

wimbd search ./test_fixtures/c4-sample.00000-of-00001.json.gz -p '^The\b'