Closed mvolar closed 2 years ago
Hello, If I recall correctly we used to see this type of trend at the beginning and ends of our outputs because the depths in these regions were significantly lower compared to other regions. What are the depth values for each of these positions?
Here are the depth values in both samples for a specific region of interest, but this pattern repeats throughout the exome, with no discernable pattern, except maybe the overall low coverage for an RNA dataset (what I also find suspicious, because according to the .paf/.sam file I have >100 coverage of those regions in both cases but now it has dropped significantly)
unmod
sample 2691 52
sample 2692 53
sample 2693 53
mod
sample 2691 47
sample 2692 47
sample 2693 47
final output table:
sample 2691 A 47
sample 2693 C 51
This is certainly curious.
Can you specific the full command & flags used to run DRUMMER and also specify how you preprocessed the data? Could you also check the number of lines in the bam-readcount output files found in the bam_readcount directory? How does this compare against the length of your reference?
The mapping was done like so
for f in *.fastq ; do
echo "$f"
minimap2 -ax map-ont ref.fasta $f > ./mapped_data/mapped_$f.sam
minimap2 -ax map-ont ref.fasta $f > ./mapped_data/mapped_$f.paf
samtools view -h ./mapped_data/mapped_$f.sam | awk ' ( $5 > 0 || $1 ~ /SQ/ ) ' | awk ' ( $2 == 0 || $1 ~ /SQ/ ) ' | awk ' ( $6 !~ /H/ || $1 ~ /SQ/ ) ' | samtools view -b - > ./mapped_data/mapped_$f.bam
samtools sort ./mapped_data/mapped_$f.bam > ./mapped_data/mapped_$f.s.bam
samtools index ./mapped_data/mapped_$f.s.bam
where is had a $f is a group of 6 pairs of fastq files (6mod 6unmod), and DRUMMER was called using the default way of caaling as specified in the tutorial:
python DRUMMER.py -r rna.fasta -n chr1 -o exome-test -c Ivt.fastq.s.bam -t native.fastq.s.bam -a exome -m True -o ./test_output
I think one issue might be with the alignment step and this is partly the fault of our documentation (I will fix this soon).
As I understand it, you are aligning to a genome (rather than transcriptome) and running in exome mode. However, when running minimap2 you should use -ax splice option rather than -ax map-ont Similarly, you do not need to perform the filtering step as specific for exome mode but rather you can just run the following: samtools view -F2308 -bS -o output.bam input.sam
Can you try this and see if it resolves your issues? Please also check the number of lines in your bamreadcount files and how this compares to the length of your reference genome.
The new splice option works, still some skippage in the begging of the chromosome, but the rest excellent. The coverage also went to the theoretical coverage we presumed from reads, which was a 100. In retrospect I should've used spliced from the beginning as I normally do, but I figured You probably know best since you are stating map-ont (because of the errors and all) so it would be best to add this in the documentation for further uses.
So all in all the issue is resolved, thank you very much for your help. :)
Glad we could resolve it. I will update the documentation to reflect all this.
Hello everyone,
when using the tool in the "exome" mode the complete output given does not have continuous numbering of control and treatment positions/bases on the reference. The "gaps" look like this:
As you see the numbers jump by 2-3 bases at a time, since I am interested in finding some specific sites/motifs I would like to know what is the formal reason behind those jumps since there is no paper in which to refer to regarding the tool in which the outputs would be explained in detail. I have already checked coverage but it is uniform throughout the molecule, and the jumps do not differ in the low/high coverage regions.
Thank you in advance,