Open Schum1 opened 8 years ago
Hi, Ale,
Thank you for your interest in AlignGraph! You may find an earlier version of BLAT to process longer contigs from https://users.soe.ucsc.edu/~kent/src/. See FAQ4 for details.
Best, Bao
From: Schum1 [notifications@github.com] Sent: Friday, August 05, 2016 3:27 AM To: baoe/AlignGraph Subject: [baoe/AlignGraph] BLAT/PBLAT issue "Maximum single piece size (5000) exceeded" (#25)
Hello Bao, I have assembled a de-novo genome and would like to align it to the reference genome of a close species using AlignGraph. So far so good. I start AlignGraph with the following command:
/home/bin/AlignGraph/AlignGraph/AlignGraph --read1 ../Start_fasta/Start_RawReads_FD.fasta --read2 ../Start_fasta/Start_RawReads_RD.fasta --contig ../../1_Short_Read_Assembly/MaSuRCA_1/CA/10-gapclose/genome.ctg.fasta --genome ../../../reference/assembly/ref_281_v5.0.softmasked_GCM.fa --distanceLow 100 --distanceHigh 1350 --extendedContig AlignGraph_1_extendedContigs.fa --remainingContig AlignGraph_1_remainingContigs.fa
This is a small summary of the input reads/genomes and their length distribution (AlignGraph_Issue.xlsxhttps://github.com/baoe/AlignGraph/files/403450/AlignGraph_Issue.xlsx).
So far so good, until bldatp/blat (I tested both) throw out the following error in the blat_doc.txt:
Maximum single piece size (5000) exceeded by query 1.1 of size (49814). Larger pieces will have to be split up until no larger than this limit when the -fastMap option is used.
I took the freedom to add some lines to the AlignGrapg.ccp. So I know that this happened around line 3654 (AlignGraph.ccp) in the
"void * task1(void * arg)"
when
"command = "/home/bin/icebert-pblat-ed0ac17/pblat tmp/_genome." + itoa(chromosomeID) + ".fa tmp/_contigs.fa -noHead tmp/_contigs_genome." + itoa(chromosomeID) + ".psl -fastMap -threads=8 > blat_doc.txt 2> blat_doc.txt";"
is called.
Now, I understand that BLAT/PBLAT is struggling with aligning the "de-novo" contigs against the "reference" genome. Because some "de novo" contigs are >5000bp and blat/pblat requires them to be shorter than 5000bp (-fastMap flag to suppress gaps) this causes the error. Did I get it right?
Is the only possibility to split my own "de-novo" contigs to acceptable sizes, or does a workaround exist? I would like to retain the longer contigs, if possible. Else I would just proceed and split every contig longer than 5000bp into separate fasta entries.
Best regards, Ale R.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/baoe/AlignGraph/issues/25, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFGl8WviKK7_lyNI4zfEonC_oXmd6nMLks5qcxAYgaJpZM4Jdjq3.
Hi Bao, tank you very much for your quick response. Because I prefer to use multithreaded pblat, I used the following approach:
Aligngraph.ccp calls the max length for queries (5000) from pblat/blat which, on its turn, calls genoFind.h. This is where the max length for queries is set. I changed the following line in genoFind.h:
_/icebert-pblat-ed0ac17/inc/genoFind.h (LINE 380)
and changed it to:
I recompiled pblat and AlignGraph. It runs just fine :)
Best, Ale
Thank you so much for this tip! I will be very helpful for other users!
Best, Bao
From: Schum1 [notifications@github.com] Sent: Tuesday, August 09, 2016 12:08 AM To: baoe/AlignGraph Cc: Bao; Comment Subject: Re: [baoe/AlignGraph] BLAT/PBLAT issue "Maximum single piece size (5000) exceeded" (#25)
Hi Bao, tank you very much for your quick response. Because I prefer to use multithreaded plat, I used the following approach:
Aligngraph.ccp calls the max length for queries (5000) from pblat/blat which, on its turn, calls genoFind.h. This is where the max length for queries is set. I changed the following line in genoFind.h:
/icebert-pblat-ed0ac17/inc/genoFind.h (LINE 380)
and changed it to:
I recompiled pblat and AlignGraph. It runs just fine :)
Best, Ale
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/baoe/AlignGraph/issues/25#issuecomment-238471683, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFGl8XaRdCvg-hIZkT09w0b4VdX75BUsks5qeCeBgaJpZM4Jdjq3.
thx!!
Remove "-fastMap" in pblat command.
Remove "-fastMap" in pblat command.
thanks!
Hello Bao, I have assembled a de-novo genome and would like to align it to the reference genome of a close species using AlignGraph. So far so good. I run AlignGraph with the following command:
/home/bin/AlignGraph/AlignGraph/AlignGraph --read1 ../Start_fasta/Start_RawReads_FD.fasta --read2 ../Start_fasta/Start_RawReads_RD.fasta --contig ../../1_Short_Read_Assembly/MaSuRCA_1/CA/10-gapclose/genome.ctg.fasta --genome ../../../reference/assembly/ref_281_v5.0.softmasked_GCM.fa --distanceLow 100 --distanceHigh 1350 --extendedContig AlignGraph_1_extendedContigs.fa --remainingContig AlignGraph_1_remainingContigs.fa
This is a small summary of the input reads/genomes and their length distribution (AlignGraph_Issue.xlsx).
So far so good, until blat/pblat (I tested both) throws out the following error in the blat_doc.txt:
Maximum single piece size (5000) exceeded by query 1.1 of size (49814). Larger pieces will have to be split up until no larger than this limit when the -fastMap option is used.
I took the freedom to add some lines to the AlignGraph.ccp. So I know that this happened around line 3654 (AlignGraph.ccp) in the
"void * task1(void * arg)"
when
"command = "/home/bin/icebert-pblat-ed0ac17/pblat tmp/_genome." + itoa(chromosomeID) + ".fa tmp/_contigs.fa -noHead tmp/_contigs_genome." + itoa(chromosomeID) + ".psl -fastMap -threads=8 > blat_doc.txt 2> blat_doc.txt";"
is called.
Now, I understand that BLAT/PBLAT is struggling with aligning the "de-novo" contigs against the "reference" genome. Because some "de novo" contigs are >5000bp and blat/pblat requires them to be shorter than 5000bp (-fastMap flag to suppress gaps) this causes the error. Did I get it right?
Is the only possibility to split my own "de-novo" contigs to acceptable sizes, or does a workaround exist? I would like to retain the longer contigs, if possible. Else I would just proceed and split every contig longer than 5000bp into separate fasta entries.
Best regards, Ale R.