Closed RajneeshSrivastava closed 3 years ago
I think the problem is exactly what the bismark_methylation_extractor tool stated, namely, that the QNAME fields (the read IDs) for the mates are not identical. The difference is in the mate number, e.g. "1:N:0:AGTCAACA" for mate 1 and "2:N:0:AGTCAACA" for mate two.
Since the QNAMEs are merely copies of the original FASTQ deflines for each read, I can think of two ways to tackle this:
· Re-run AriocE using a QNAME pattern attribute in the
For example,
<dataIn QNAME="*:*:*:(*:*:*:*) " ...
would use only the 4th, 5th, 6th, and 7th fields of the QNAME (lane, tile, x, y) to identify reads.
If you need to preserve the information in the first three fields (instrument ID, run ID, flowcell ID) as well, you could do this:
<dataIn QNAME="(*:*:*:*:*:*:*) " ...
However you choose to do it, the goal is to cause AriocP to emit only the encoded QNAME pattern instead of the entire defline.
Parsing the FASTQ defline for both QNAME and RG is described in the Arioc user guide. You can also check AriocE's output to be sure that you have encoded the desired pattern.
· Filter the QNAMEs in the SAM/BAM output.
This should be straightforward with a bit of Linux awk or sed code. There might also be a software tool somewhere that can accomplish this. (If there isn't, perhaps there ought to be!)
My preference is always to specify the QNAME encoding in AriocE.
· rw
Hi Team, I am trying to extract the methylation marks from the AriocP post aligned sam file by using the recommended bismark_methylation_extractor script. It is showing an error:
However, as per recommendation by Bismark, I already sorted (sort -n) the files. Still it is giving the above error.
Below I am pasting the 1st two reads. I am not sure where the bismark_methylation_extractor script is finding it incorrect.
Many thanks in advance! Rajneesh