Open ccbaumler opened 1 year ago
A continuation using awk
to parse a tab-separated dataset.
awk -F'\t' 'NR>1 && NF {print " - " $1}' assembly-test.tsv
Here I have called the awk program:
-F'\t'
identifies the field separator to be tabs'NR>1
tells the program to return all rows greater than row number 1 (i.e. skip the header)NF
checks the number of fields per line. Here it is making sure we do not return empty lines&&
ensures that both commands must be true to operate{print " - " $1}'
will print
a hyphen followed by the contents of the first field defined by -F
This awk
command can then be piped into our sed
command from the previous comment:
awk -F'\t' 'NR>1 && NF {print " - " $1}' assembly-test.tsv | sed "/samples:/r /dev/stdin" -i config.yml
When working with the SRA, a list of accession numbers may be exported. To insert that list directly into a
config.yml
file for use in genome-grist, we can use thesed
command, or base python to edit the list, read theconfig.yml
file, and insert the list into thesamples:
section of theconfig.yml
.Using
sed
My
config.yml
file:The accession list directly exported from the SRA Run Browser as a
txt
fileTo format and insert the accession list into the
samples:
section of theyml
The first
sed
command inserts aspace
,-
, and anotherspace
at the beginning of each line in the accession listtxt
file. This formats the list for theconfig
file.The second
sed
command reads the output of the first command and inserts it in the after the line matchingsamples:
in theconfig.yml
file.Outputting a
config.yml
file ingenome-grists
desired format.Using
base python
With the exact same structure as above, using a python script instead of
sed
linux command line function we can achieve the same output.