Open demis001 opened 4 years ago
Hi Dereje,
I think the best approach for dealing with the hybrid genome is to create the "true" diploid genome with C57BL and 6NJ haplotype. STAR does not have that capability yet, but you can use, for instance, AlleleSeq to do it. A good discussion about mapping to the diploid genomes is here: https://github.com/alexdobin/STAR/issues/968 One of the main complications with this approach is that haplotype coordinates change if indels are included, and so the alignments to two haplotypes are not easy to reconcile.
Alternatively, you can use the WASP approach: map to one of the haplotypes, and supply the other haplotype variants as VCF file, with the genotype column 0/1 for all variants. Then the reads that overlap variants and passed the WASP filtering will have the vW:i:1 tag. The other values in this tag indicate that haplotype mapping was not consistent, such reads should be typically discarded.
Cheers Alex
https://github.com/alexdobin/STAR/issues/403#issuecomment-665940859
The WASP procedure does not change the alignments, rather, it adds the vW:i:<N>
flag to them. N=1 indicates that alignment passed filtering, i.e. it maps to the variant confidently. Other N values indicate that the alignment is not mapped confidently, and - typically - has to be filtered out of the BAM file before downstream analyses. You can add vA and vG to the --outSAMattributes
- they will tell you which variants are overlapped by this alignment, and which haplotype it maps to.
Also, please check the Log.out file for the number of variants detected in the VCF file, to make sure that the formatting of the VCF file is correct.
Cheers Alex
Hi Alex,
I have RNA-Seq data from hybrid mouse (C57BL and 6NJ). The reference mouse genome is from C57BL. This is my first time to deal with hybrid, I have seen some options in STAR that talk about variant and don't know how to add this to the option. Is it possible to overcome the alignment problem due to the SNP difference between the two strains if I align to the mouse reference genome?
Does the WASP filtering of allele specific alignments help in this senario? if it is how to use it.
Does increasing max number of mismatches per pair relative to read length overcome this?
Any idea?