Problems with CHROMOSOME_NAMING

GoogleCodeExporter commented 8 years ago

Hello,
I am using GASV with my own genome, so I need to put the parameter  
-CHROMOSOME_NAMING, where I put one file called scaffold.txt with this format:
scaffold_1
scaffold_2
scaffold_3
scaffold_4
scaffold_5
scaffold_6
scaffold_7
scaffold_8
scaffold_9
scaffold_10
...
Acording to my results ("filename"_all.deletion):
3033874 3033974 +   HWI-ST700660_99:2:1101:7692:3237#4@0    3034425 -   3034325

the format should be wrong because there is no information of the scaffold.
could you help me, please?
Thanks!

Original issue reported on code.google.com by roal...@gmail.com on 22 Feb 2012 at 1:31

GoogleCodeExporter commented 8 years ago

Thanks for your interest in GASV. I'm glad to help with the question; 

First, just to be explicit, I want to distinguish the two steps in the GASV 
pipeline if you are starting from a BAM file.

(1) The BAM preprocessor (bin/BAM_preprocessor.pl) is where the alternate 
chromosome names would be needed. The output of the BAM preprocessor is a set 
of discordant ESPs in the GASV ESP file format (see MANUAL.txt for more 
information). 

The file you mentioned above "filename"_all.deletion, is one of these GASV 
input files. At this point these files should NOT have the names of the 
scaffold in them, just integers in the chromosome field.

(2) The next step is to run GASV (lib/gasv.jar) to cluster the discordant ESP 
files from step (1). GASV requires the chromosome field be an integer.

====

I agree that your output from "filename"_all.deletion does not look correct. It 
looks like your scaffold.txt file is not in correct format.

The format for the alternative chromosome naming requires two columns. As 
indicated in our documentation:

If you have alternate chromosome names, define the
alternate names in chromosome naming file as follows: 

Column 1 - Chromosome naming in BAM file, 
Column 2 - replacing chromosome number

Column 1        Column 2 
-------------------------
Ca21chr1        1
Ca21chr2        2
Ca21-mtDNA      9

Let me know if this helps.

Thanks,

Suzanne

Original comment by sora...@gmail.com on 22 Feb 2012 at 1:54

GoogleCodeExporter commented 8 years ago

It seems work properly, but I have some doubts. My file now looks like this:
scaffold_1 1
scaffold_5 5
...
scaffold_2000 2000

and I have 1398 scaffolds that are less than the index (the second column), is 
this a problem? should be "scaffold_2000 1398"?
So depending on your answer, shall I put for gasv.jar --numChrom 1398 or 
--numChrom 2000?
And I see in the example that you put --minClusterSize 4, is this a reliable 
parameter to avoid false positives?

Thanks a lot!!!

Original comment by roal...@gmail.com on 22 Feb 2012 at 2:13

GoogleCodeExporter commented 8 years ago

I am glad that things are working now. 

Our BAM preprocessor treats the first column as text, so there should not be 
any problem with your file.

Also, the value of --numChrom should be at least the value of the largest 
chromosome number (column 2 in your scaffold.txt file). If the largest value is 
2000 then use --numChrom 2000

Finally, the minimum number of fragments in a cluster you use should depend on 
your overall coverage and tolerance for false positives. Many consider 2 to be 
the minimum cluster size one would consider regardless of coverage. But, 
without knowing more about your work I can't give specific feedback.

Good luck using GASV and thanks again for your interest in our software.

Suzanne

Original comment by sora...@gmail.com on 22 Feb 2012 at 2:23

GoogleCodeExporter commented 8 years ago

Thanks a lot for your quick reply! Besides, just as a suggestion, could be a 
good idea to put something more in detail in the documentation about the 
parameter -CHROMOSOME_NAMING?
Thanks again! I keep going with GASV

Original comment by roal...@gmail.com on 22 Feb 2012 at 2:31

GoogleCodeExporter commented 8 years ago

Original comment by sora...@gmail.com on 30 Apr 2012 at 4:24

Changed state: Done

gmarocena / gasv

Problems with CHROMOSOME_NAMING #5