Question about the complete output.

DepledgeLab / DRUMMER

DRUMMER: Detection of RNA modifications in nanopore direct RNA Sequencing datasets

GNU General Public License v3.0

21 stars 1 forks source link

Question about the complete output. #16

Closed mvolar closed 2 years ago

mvolar commented 2 years ago

Hello everyone,

when using the tool in the "exome" mode the complete output given does not have continuous numbering of control and treatment positions/bases on the reference. The "gaps" look like this:

As you see the numbers jump by 2-3 bases at a time, since I am interested in finding some specific sites/motifs I would like to know what is the formal reason behind those jumps since there is no paper in which to refer to regarding the tool in which the outputs would be explained in detail. I have already checked coverage but it is uniform throughout the molecule, and the jumps do not differ in the low/high coverage regions.

Thank you in advance,

JonAbebe commented 2 years ago

Hello, If I recall correctly we used to see this type of trend at the beginning and ends of our outputs because the depths in these regions were significantly lower compared to other regions. What are the depth values for each of these positions?

mvolar commented 2 years ago

Here are the depth values in both samples for a specific region of interest, but this pattern repeats throughout the exome, with no discernable pattern, except maybe the overall low coverage for an RNA dataset (what I also find suspicious, because according to the .paf/.sam file I have >100 coverage of those regions in both cases but now it has dropped significantly)

unmod
sample  2691    52
sample  2692    53
sample  2693    53

mod
sample  2691    47
sample  2692    47
sample  2693    47

final output table:

sample  2691    A   47
sample  2693    C   51

DepledgeLab commented 2 years ago

This is certainly curious.

Can you specific the full command & flags used to run DRUMMER and also specify how you preprocessed the data? Could you also check the number of lines in the bam-readcount output files found in the bam_readcount directory? How does this compare against the length of your reference?

mvolar commented 2 years ago

The mapping was done like so

for f in *.fastq ; do
    echo "$f"
    minimap2 -ax map-ont  ref.fasta $f > ./mapped_data/mapped_$f.sam
    minimap2 -ax map-ont  ref.fasta $f > ./mapped_data/mapped_$f.paf
    samtools view -h ./mapped_data/mapped_$f.sam | awk ' ( $5 > 0 || $1 ~ /SQ/ ) ' | awk ' ( $2 == 0 || $1 ~ /SQ/ ) ' | awk ' ( $6 !~ /H/ || $1 ~ /SQ/ ) ' | samtools view -b - > ./mapped_data/mapped_$f.bam
    samtools sort ./mapped_data/mapped_$f.bam > ./mapped_data/mapped_$f.s.bam
    samtools index ./mapped_data/mapped_$f.s.bam

where is had a $f is a group of 6 pairs of fastq files (6mod 6unmod), and DRUMMER was called using the default way of caaling as specified in the tutorial:

python DRUMMER.py -r rna.fasta -n chr1 -o exome-test -c Ivt.fastq.s.bam -t native.fastq.s.bam -a exome -m True -o ./test_output

DepledgeLab commented 2 years ago

I think one issue might be with the alignment step and this is partly the fault of our documentation (I will fix this soon).

As I understand it, you are aligning to a genome (rather than transcriptome) and running in exome mode. However, when running minimap2 you should use -ax splice option rather than -ax map-ont Similarly, you do not need to perform the filtering step as specific for exome mode but rather you can just run the following: samtools view -F2308 -bS -o output.bam input.sam

Can you try this and see if it resolves your issues? Please also check the number of lines in your bamreadcount files and how this compares to the length of your reference genome.

mvolar commented 2 years ago

The new splice option works, still some skippage in the begging of the chromosome, but the rest excellent. The coverage also went to the theoretical coverage we presumed from reads, which was a 100. In retrospect I should've used spliced from the beginning as I normally do, but I figured You probably know best since you are stating map-ont (because of the errors and all) so it would be best to add this in the documentation for further uses.

So all in all the issue is resolved, thank you very much for your help. :)

DepledgeLab commented 2 years ago

Glad we could resolve it. I will update the documentation to reflect all this.