apeltzer / CircularMapper

A method to improve mappings on circular genomes, using the BWA mapper
GNU General Public License v3.0
8 stars 5 forks source link

realign error #9

Open NicRamb opened 5 years ago

NicRamb commented 5 years ago

Hi, I'm trying to map a human mitochondrial sequence to the reference, but i get stuck at the realign step. Here is what I ran:

java -jar generator-1.93.4.jar -e 500 -i reference.fasta -s "chrM" bwa index reference_500.fasta bwa mem -R "readgroups" -t 16 reference_500.fasta R1fastqfile R2fastqfile > circularmapped.sam java -jar realign-1.93.4.jar -e 500 -i circularmapped.sam -r reference.fasta

at this step it throws the following error:

DEBUG 2019-02-18 11:26:58 BlockCompressedOutputStream Using deflater: Deflater Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 213, end 217, length 213 at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) at java.base/java.lang.String.substring(String.java:1874) at ReadTrimmer.trimRead(ReadTrimmer.java:119) at RealignSAMFile.splitRead(RealignSAMFile.java:293) at RealignSAMFile.processReads(RealignSAMFile.java:239) at RealignSAMFile.inspectReads(RealignSAMFile.java:193) at RealignSAMFile.readSAMFile(RealignSAMFile.java:136) at RealignSAMFile.main(RealignSAMFile.java:113)

apeltzer commented 5 years ago

Some more context:

Apparently, realignsamfile does not like to have reads with gaps in the elongated region. Then, I solved this issue excluding those reads during the mapping step with this parameter "-o 0" for bwa aln.
NicRamb commented 5 years ago

ok, thank you for the reply

apeltzer commented 5 years ago

As I'm only working in my spare time on these former PhD projects, I will hopefully find some time at some point to work on these....

NicRamb commented 5 years ago

It's already a very useful tool, let's hope for that

apeltzer commented 5 years ago

Yeah, just didn't want you to sit there waiting for improvements and let you know beforehand that I might have time to work on it but can't promise unfortunately :-/

apeltzer commented 5 years ago

Thanks :-)

showteeth commented 5 years ago

Apparently, realignsamfile does not like to have reads with gaps in the elongated region. Then, I solved this issue excluding those reads during the mapping step with this parameter "-o 0" for bwa aln.

Thank you for providing such an awesome tool. But I am confused that why you disallow gap in the elongated region.

apeltzer commented 5 years ago

I think during the development of the method, we didn't design it to be able to compensate with all the possible CIGAR string types that can occur. Potentially, this could be added of course but I doubt I'll find the time for it, at least not at the moment.

bamorim-bio commented 2 years ago

Hi Alexander! I actually still get the same error. I tried doing the alignment with the parameter mentioned (just to be sure: is it -o or -O 0 ?):

bwa mem -t 4 -O 0 data/mtDNA_rCRS_500.fasta data/out.R1.DEMI156.fastq.gz data/out.R2.DEMI156.fastq.gz > data/DEMI156_500.aln.sam

When I realign I get this error:

realignsamfile -e 500 -i data/DEMI156_500.aln.sam -r data/mtDNA_rCRS.fasta DEBUG 2022-03-02 14:12:08 BlockCompressedOutputStream Using deflater: Deflater Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 150, end 151, length 150 at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319) at java.base/java.lang.String.substring(String.java:1874) at ReadTrimmer.trimRead(ReadTrimmer.java:114) at RealignSAMFile.splitRead(RealignSAMFile.java:293) at RealignSAMFile.processReads(RealignSAMFile.java:239) at RealignSAMFile.inspectReads(RealignSAMFile.java:193) at RealignSAMFile.readSAMFile(RealignSAMFile.java:136) at RealignSAMFile.main(RealignSAMFile.java:113)

Do you know what I could be doing wrong? :(

apeltzer commented 2 years ago

@jfy133 heard of this before? Honestly, I'm not really having the time to maintain this anymore (sadly), so cannot really look into this unfortunately. Its been 6 years since I worked on this and couldn't find too much time working on improving these tools.

Family + work already consume most of my time and this is partially even pre-PhD. Sorry this is no good news, but trying to be honest here ...

bamorim-bio commented 2 years ago

@jfy133 heard of this before? Honestly, I'm not really having the time to maintain this anymore (sadly), so cannot really look into this unfortunately. Its been 6 years since I worked on this and couldn't find too much time working on improving these tools.

Family + work already consume most of my time and this is partially even pre-PhD. Sorry this is no good news, but trying to be honest here ...

Hello Alex, no worries, I understand completely! So I think the main problem is that I was aligning forward and reverse reads and so I was using bwa mem and not aln/samse which you use in your example. I ended up merging my reads with fastp and then did the commands as in your example and it worked perfectly. I am a very beginner in DNA assembly so I don't know the implications of merging forward and reverse reads (I remember reading that it was not recommended? but I could be mistaken...). So for now I will just continue with this option. Thanks anyway!

jfy133 commented 2 years ago

Hi @apeltzer and @bamorim-bio

No I haven't actually seen that error myself.

I suspect as you said @bamorim-bio - you are most likely required required to use bwa aln with CircularMapper.

Ancient DNA reads are rarely long enough that they do not merge (normally ~30-70bp), so it's unnecessary to do paired-end mapping, as you have no additional positional information that you may gain from paired-end mapping (i.e., you have no uncharacteirsed template between the two pairs, which can be inferred from other reads; something in longer-read modern data cna be very useful e.g. for haplotyping). Palaeogenomicists therefore rather mostly do merging to increase base-call confidence for variant calling (as you expect to have low-covereage data, so every little bit of increased confidence helps).

Therefore ultimately, aDNA people will in 95% cases merge, and use highly accurate aligner algorithms such as bwa aln to get the best possible alignment.

So the solution is the only one you've described.

jfy133 commented 2 years ago

@apeltzer I think probably easiest to updat eREADME to specify you SHOULD use bwa aln, and then leave a comment here as the soluiton

apeltzer commented 2 years ago

I just did that.