Closed GoogleCodeExporter closed 8 years ago
Thanks for your interest in GASV. I'm glad to help with the question;
First, just to be explicit, I want to distinguish the two steps in the GASV
pipeline if you are starting from a BAM file.
(1) The BAM preprocessor (bin/BAM_preprocessor.pl) is where the alternate
chromosome names would be needed. The output of the BAM preprocessor is a set
of discordant ESPs in the GASV ESP file format (see MANUAL.txt for more
information).
The file you mentioned above "filename"_all.deletion, is one of these GASV
input files. At this point these files should NOT have the names of the
scaffold in them, just integers in the chromosome field.
(2) The next step is to run GASV (lib/gasv.jar) to cluster the discordant ESP
files from step (1). GASV requires the chromosome field be an integer.
====
I agree that your output from "filename"_all.deletion does not look correct. It
looks like your scaffold.txt file is not in correct format.
The format for the alternative chromosome naming requires two columns. As
indicated in our documentation:
If you have alternate chromosome names, define the
alternate names in chromosome naming file as follows:
Column 1 - Chromosome naming in BAM file,
Column 2 - replacing chromosome number
Column 1 Column 2
-------------------------
Ca21chr1 1
Ca21chr2 2
Ca21-mtDNA 9
Let me know if this helps.
Thanks,
Suzanne
Original comment by sora...@gmail.com
on 22 Feb 2012 at 1:54
It seems work properly, but I have some doubts. My file now looks like this:
scaffold_1 1
scaffold_5 5
...
scaffold_2000 2000
and I have 1398 scaffolds that are less than the index (the second column), is
this a problem? should be "scaffold_2000 1398"?
So depending on your answer, shall I put for gasv.jar --numChrom 1398 or
--numChrom 2000?
And I see in the example that you put --minClusterSize 4, is this a reliable
parameter to avoid false positives?
Thanks a lot!!!
Original comment by roal...@gmail.com
on 22 Feb 2012 at 2:13
I am glad that things are working now.
Our BAM preprocessor treats the first column as text, so there should not be
any problem with your file.
Also, the value of --numChrom should be at least the value of the largest
chromosome number (column 2 in your scaffold.txt file). If the largest value is
2000 then use --numChrom 2000
Finally, the minimum number of fragments in a cluster you use should depend on
your overall coverage and tolerance for false positives. Many consider 2 to be
the minimum cluster size one would consider regardless of coverage. But,
without knowing more about your work I can't give specific feedback.
Good luck using GASV and thanks again for your interest in our software.
Suzanne
Original comment by sora...@gmail.com
on 22 Feb 2012 at 2:23
Thanks a lot for your quick reply! Besides, just as a suggestion, could be a
good idea to put something more in detail in the documentation about the
parameter -CHROMOSOME_NAMING?
Thanks again! I keep going with GASV
Original comment by roal...@gmail.com
on 22 Feb 2012 at 2:31
Original comment by sora...@gmail.com
on 30 Apr 2012 at 4:24
Original issue reported on code.google.com by
roal...@gmail.com
on 22 Feb 2012 at 1:31