Open YiweiNiu opened 2 years ago
Hi, I know this problem very well - unfortunately there is no way around comparing the sequence directories of BAM files.
Like you suggested, I would replace the sequence header AND remove scaffolds that are NOT part of the new sequence directory.
It seems that by doing so, you replaced the entire header INCLUDING Read Group IDs. You will have to merge the existing Read Group IDs and new sequence header manually...
$ samtools view -H <DNA-BAM> | grep "^@RG" > read_ids_header.sam
$ samtools view -H <RNA-BAM> | grep "^@SQ" > seq_ids_header.sam
...
Hi, thank you for your quick reply.
I did as you said, and JACUSA2 now ran successfully without any errors.
However, the output was empty. I wonder whether it is because I modified the header of the RNA-seq bam file, but I failed to find any error messages in the log file.
Here is the command I used.
input_DNA=/home2/niuyw/project/MEI2/gatkout/${sample}/${sample}.final.bam
input_RNA=$WORKDIR/STAR/$dataset/${sample}.sortedByCoord.q255.rmDup.bam
out_dir=$WORKDIR/JACUSA2/$dataset/$sample
if [ ! -d $out_dir ]; then mkdir -p $out_dir; fi
# reheader input RNA
$path2samtools reheader $WORKDIR/JACUSA2/$dataset/header_new.sam $input_RNA > $out_dir/tmp.bam
$path2samtools index -@ $PPN $out_dir/tmp.bam
# JACUSA2
$path2java -jar $path2JACUSA2 call-2 -p $PPN -a B,D,H:condition=1,I,S,Y -s -f B -q 35 -r $out_dir/${sample}.txt -F1 1024 $input_DNA $out_dir/tmp.bam
gzip $out_dir/${sample}.txt.filtered
Both HG00315.txt
and HG00315.txt.filtered.gz
were empty.
I also tried to call RNA editing sites using RNA-seq data alone (call-1
mode), and the output seems normal.
Any clues on this? Thanks again for your help.
Check Chrs. Please check if all your BAMs files are referring to the same chromosomes, e.g.: 1, 2, 3, ... etc. OR chr1, chr2, ... etc.
Check DNA sample Run call-1 on DNA sample - are there any SNPs called? If not, that is strange!
Try less stringent
Run: $ $path2java -jar $path2JACUSA2 call-2 -p $PPN -a B,D -r $out_dir/${sample}.txt $input_DNA $out_dir/tmp.bam
Hi, thanks a lot for your reply.
- Please check if all your BAMs files are referring to the same chromosomes, e.g.:
Yes, both RNA-seq BAM and DNA-seq BAM used the "chr1, chr2..." names. As I mentioned above, they just differ for some alternative scaffolds and decoy sequences.
- Run call-1 on DNA sample - are there any SNPs called? If not, that is strange!
I tried this with the command line:
# input
input=/home2/niuyw/project/MEI2/gatkout/${sample}/${sample}.final.bam
out_dir=$WORKDIR/JACUSA2/$dataset/$sample/dna
if [ ! -d $out_dir ]; then mkdir -p $out_dir; fi
$path2java -jar $path2JACUSA2 call-1 -p $PPN -a B,D,I,S,Y -s -f B -q 35 -r $out_dir/${sample}.txt $input
The output was empty, both for the HG00315.txt
and HG00315.txt.filtered
. That is strange.
- Try less stringent
I separated the feature-filtered results in another file using the -s
parameter. Since both the output HG00315.txt
and feature-filtered file HG00315.txt.filtered
were empty, I don't think it was because of stringent filtering.
Please run: $path2java -jar $path2JACUSA2 pileup -p $PPN -a B,D,I,S,Y -q 35 -r $out_dir/${sample}.txt $input_DNA
This should generate a "samtools like pileup" - it the result file is empty, something is wrong with your DNA.bam.
What mapper did you use for mapping DNA reads? JACUSA2 filter reads with mapping mapping quality < 20. "You might need to lower the mapping quality filter "-m1".
Hi, this data was from the 1KGP project, and they mapped the reads using BWA-MEM.
How do you suggest setting the m1
parameter?
Do you see reads if you load them up in the IGV?
Hi,
I am using JACUSA2 to detect RNA editing events with paired RNA-seq and DNA-seq data.
However, DNA-seq and RNA-seq data were mapped to different reference genomes. The reference genomes were both hg38 with the same names for major chromosomes, but the reference genome for DNA-seq has some alternative scaffolds and decoy sequences.
Here is the reference genome for DNA-seq: GRCh38_full_analysis_set_plus_decoy_hla.fa, and the reference genome for RNA-seq: Homo_sapiens_assembly38_noALT_noHLA_noDecoy_ERCC92.fasta.
First, I encountered the error: "Sequence Dictionaries of BAM files do not match"
Then I replaced the RNA-seq BAM header using the one from the DNA-seq BAM, but the output was empty. Here is log info:
The RNA-seq BAM I used was HG00315.star.GRCh38.Aligned.sortedByCoord.markduplicates.bam, and the DNA-seq BAM I used was HG00315.final.cram (I used samtools to convert CRAM to BAM).
I wonder if there is any workaround for this situation. Thanks a lot for your help.
Bests, Yiwei