Open vortexing opened 5 years ago
Oooh, fun! I'd love to write something for this, but web links are probably a better/faster option, yes. I'm not a sed expert, but tr and awk are some of my best friends. This should include a link to explainshell as it covers these tools.
Actually, I have been wanting to do a page on coreutils - the answer to any and all shell-based data manipulation issues! (coreutils includes tr, cut, uniq, sort, head, tail, paste, comm, and one of my favorite - tac among many many others).
That being said, I have never found a really good single source of information on these tools... . :(
If you write it (slowly, in parts), they will read it. And by they, I certainly mean at least me. I know I can't be alone here. Also, I THOUGHT I put in the explainshell.com link.... hmmmm. Where did I put it (maybe in my deleted PR?)?
I'm in full support of a coreutils
for dummies section in this markdown: https://github.com/FredHutch/wiki/blob/master/_scicomputing/software_linux101.md
Ping me for editing help if it's helpful!
Peter Caton's explanations of one-liners are great. He later published them in book form, too. awk: https://catonmat.net/awk-one-liners-explained-part-one sed: https://catonmat.net/sed-one-liners-explained-part-one
On a related note, Data Science at the Command Line (https://www.datascienceatthecommandline.com/) is an excellent resource which has some coverage of sed/awk/tr, etc. Though likely more a Resource Library type thing...
We had a discussion in the wiki-writers session about what this page might look like without duplicating too much that's already available on other sites.
The goal for these docs should give someone without experience with these command line text processing tools enough information to be able to search available documentation and external resources for answers to their specific question. For example, information like:
You get the idea. Then we can link to useful sources after that brief introduction.
Good plan. I've reviewed a number of external resources suggested here and found elsewhere, and while there is a lot of good information out there, there are a lot of concepts taken for granted, such as:
While the link above from @ptvan is great and explains these things, it is not as accessible as a page in our wiki.
Is it even possible to teach these things quickly and concisely? Is it worth teaching more advanced pipelines without understanding these things?
I also think an upside-down (procedure-based) version of what @atombaby suggests:
I agree that one of the wiki's main strengths is good introductions ("Core Utils 101") from which readers could jump off into more advanced resources. Thanks for all your continued work !
Proposed Domain Scientific computing - linux 101 page
Content Summary Can someone provide some introductory material from the web or wherever about why you should stop and learn how to use these commands to do things, such as, manipulating files for data cleaning?
Local Content Expert(s) @atombaby or @k8hertweck if you happen to have any links handy on this!!!