Closed moldach closed 4 years ago
no that won't be a problem. The gaps file is merely used to filter out calls that span assembly gaps, certainly an issue in mammalians since those gaps are surrounded by repeats, causing mapping ambiguity. Similar for dups.
added this explanation to README. Closing.
In step 3, for gap annotations it states we should extract the files from UCSC's
chromAgp.tar.gz
into a folder, merge them and then grep for component type U or N (which is in column 5).IMO the description for this file by UCSC is lacking; there is no description of what type U or N are:
From the mouse example I see there are 4 fields:
I would like to build a
SONIC
for C. elegans (WBcel1235) but there are no type U or N to be found (I only see type F - what's this?) - resulting in an empty.bed
file.What do these mean?
Furthermore, in step 4 for segmental duplication annotations there is a
genomicSuperDups.txt.gz
file for the mouse genome; however, I cannot find reference for this in C. elegans. I will contact UCSC to find out if they have a comparable file for this build.My question to you is, in the absence of such file(s). Would there be any harm building SONIC when passing these two empty files (
gaps.bed
&dups.bed
)?sonic --ref ref.fasta --reps reps.out --gaps gaps.bed --dups dups.bed --make-sonic cell.sonic --info "UCSC_WBcel235"