Open bpnx opened 7 years ago
FYI: I have a bigger PR coming on this (probably next week)
Hello, sed is used because that tr command generates blank lines in case of repeated spaces, and normally it's not good; in a single command sed substitutes spaces and tabs, repeated and mixed a gives a clean ouput. Since basic regular expressions are covered on episode 02 and sed is used in this episode, I think that it's useful for people to see that example of sed usage. I don't think it's intimidating, no more than: grep -Eo '\d{4}-\d{4}' 2014-01_JA.tsv (taken fron episode 02)
OK, I suppose this makes sense. I can merge if other people agree. @drjwbaker ?
Agreed. Looks a great change.
The command looks intimidating, but agree that it's not worse than other stuff in this episode. It's also more realistic, and perhaps it's good to stress regexps rather than introducing yet another command (tr
, which I at least never use in practice myself). So +1
Perhaps change "repeated space" to "repeated whitespace"?
Also, is it necessary/correct to escape +
as \+
here?
Hello, I'm changing "repeated space" to "repeated whitespace", thank you. Escaping + is necessary, otherwise sed treats it as a simple character (I'm sure someone knows why :-) )
Ouch, \s
doesn't work on BSD/Mac: https://superuser.com/a/112837/56154
Even if we use the [[:space:]]
group instead, it still fails to insert newlines: https://stackoverflow.com/questions/10748453/replace-comma-with-newline-in-sed
Test:
$ echo "Hello world" | sed 's/[[:space:]]/\n/g'
Hellonworld
Perhaps we need to stick with tr
anyways?
Hello, it's because of different sed versions, the one I've used in example is GNU sed, installed on every linux distribution: it supports extended regular expressions and has GNU extensions (often supported) . Unfortunately I don't have a Mac to test; have you tried with "-E" or "-r" options (if available) ? Or you could install gnu-sed package and it should work. However, to be more portable, the sed part could be rewritten into: sed 's/[[:space:]]+/ /g' | tr ' ' '\n' | sed '/^$/d' First it changes (repeated) whitespaces and tabs into a single space, then tr changes them into newlines and then the final sed removes all blank lines (for rows with spaces/tabs at the beginning or at the end) Or tr ' \t' '\n\n' |sed '/^$/d' simple, it should work anywhere, even with older sed versions, but students don't see the use of sed to remove repeated spaces and tabs.
Forgot to say: thank you for your analysis!
I'm sorry, it's really annoying when you find a nice example command, just to learn that it doesn't work cross platform.. Anyways, -E
is available, but didn't help. Unless we add GNU sed to our install requirements, we probably have to live with this.
Can confirm that this works:
sed -E 's/[[:space:]]+/ /g' | tr ' ' '\n' | sed '/^$/d'
(Had to include -E
in the first sed
)
The last example also, not sure which one is the best to teach..
I forgot to set the "-E" flag, sorry. About the last example: in my environment also tr [:space:] '\n' works (note the single \n), I think in BSD/Mac should work too Please let me know what example you think it's better to teach, so I change the PR Thank you
Hello, I think that 3 examples should be changed: after a "uniq -c" command a "-n" flag on sort command is very useful. About sed command: I think that removing spaces or normalizing them is a very common need and the example using sed could be appreciated, in my opinion