I suggest updating this recommendation so that it is more likely to work as expected.
cut -c9- cuts at the 9th character position. But the size of each row can vary; uniq -c adds padding before the number and the number may be large depending on how large the gramify output is.
https://github.com/0xVavaldi/gramify/blob/64ef8cfb9edb86419051a4b4b819ed0fa079f107/gramify.py#L792-L794
I suggest updating this recommendation so that it is more likely to work as expected.
cut -c9-
cuts at the 9th character position. But the size of each row can vary;uniq -c
adds padding before the number and the number may be large depending on how large the gramify output is.A simple way to do this in awk would be to use
{print $2}
, but that would remove any ngrams with a space.So the solution I found is to use a bit more complex awk command:
You can try it out with different variations with this test command: