Closed ggrimes closed 7 months ago
As an example
zgrep -c "^>" data/yeast/transcriptome/Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa.gz
Then Count the number of T bases in fasta file
zgrep -v "^>" ./nextflow_rnaseq_training_dataset/data/yeast/transcriptome/Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa.gz|grep -c T
Use a queue channel with A,T,G,C to count all bases
zgrep -v "^>" data/yeast/transcriptome/Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa.gz|grep -c ${base}
For the combining channels this can be count the number of A,T,G,C within in each sequence within the fasta file
nextflow.enable.dsl=2
params.fasta=""
process COUNT {
input: each nt path sequence
script: """ grep -o -E '^>\w+' ${sequence}| tr -d '>'| tr '\n' '\t' printf $nt cat ${sequence}|grep -v '^>' |grep -c ${nt} """
}
ch_seq = Channel .fromPath(params.fasta) .splitFasta( by: 1 ,file:true) .take(10)
ch_base = Channel.of('A','T','G','C')
workflow { COUNT(ch_base,ch_seq) }
Wouldn't this be considered complex for a novice?
grep -o -E '^>\\w+' ${sequence}| tr -d '>'| tr '\n' '\t'
printf $nt
cat ${sequence}|grep -v '^>' |grep -c ${nt}
Yes, it requires more than is described in the carpentries intro to Unix. Maybe there is an easier way to do this .
grep ">" ${sequence} |cut -f1 -d " "|tr -d ">"
For sequence headers, I'll usually use
grep ">" ${sequence} | cut -c2-
but that still leaves everything after the space.
On topic though I still think we should minimize piping and have at most two pipes, with no regular expression stuff if possible.
https://www.nextflow.io/docs/latest/operator.html#splitfasta
Do you think using the splitfasta
operator would be too much?
https://www.nextflow.io/docs/latest/operator.html#splitfasta
Channel
.fromPath('data/yeast/reads/transcriptome/*')
.splitFasta( record: [id: true, seqString: true ])
Depends where one is in the episodes. Once you've covered operators, it should be fine.
Change the process episodes to remove RNA-Seq specific examples and have more general ones. These new examples should only use basic UNIX commands, such as those mentioned in https://swcarpentry.github.io/shell-novice lesson.
Some examples for useful Bash commands to handle fasta files can be found here https://www.biostars.org/p/17680