Closed MNM-TB closed 3 years ago
For your purpose, you can modify the code to add the filter rules.
query.c/PrimTrim [line 304]
void PrimTrim(fastq_t *fq, query_t *Q, arginfo_t *arg)
{
read_t *read;
char *seq, *qual;
int discard=0;
read = &fq->cache[fq->bufnum++];
seq = fq->read.seq; qual = fq->read.qual;
strcpy(read->name, fq->read.name);
strcpy(read->mark, fq->read.mark);
if (Q->isfind) { // find the primer sequence
strcpy(read->seq, seq+Q->pstart);
strcpy(read->qual, qual+Q->pstart);
if (Q->pend) {
/* read through */
int inlen = Q->pend - Q->pstart +1;
strcpy(read->seq+inlen, "\n");
strcpy(read->qual+inlen, "\n");
}
float q = MeanQuality(read->qual, arg->phred);
if (q < arg->args->minqual) {
Q->badqual = 1;
if (arg->args->keep) {
strcpy(read->seq, fq->read.seq);
strcpy(read->qual, fq->read.qual);
}
else discard = 1;
}
if (strlen(read->seq) < 19 || strlen(read->seq) > 23) discard = 1; // ADD THIS LINE
}
else { // can't find the primer sequence
if (arg->args->keep) {
strcpy(read->seq, fq->read.seq);
strcpy(read->qual, fq->read.qual);
}
else discard = 1;
}
if (discard) {
strcpy(read->seq, "NNNNNNNNNNNNNNNNNNNN\n");
strcpy(read->qual, "!!!!!!!!!!!!!!!!!!!!\n");
}
return ;
}
Then recompile the code!
I have an issue which I don't know how to get around.
I'm trying to utilize the pTrimmer trim 20bp molecules barcodes out from NGS sequenced amplicons.
The amplicon file I use looks like this:
In a very small fraction f our amplicons, we lack an inserted barcode. The alignment of those two "primer" sequences results in a 0-bp read. This is however still included in the output file but then just like this:
All my downstream tools choke on such lines, many just crashes.
I'm wondering how to discard all 0-bp reads. I would also wish to be able to only include those in a length range e.g., 18-22 bases. Can this be possible?
Thanks again for a great tool!