bioinform / breakseq2

BreakSeq2: Ultrafast and accurate nucleotide-resolution analysis of structural variants
BSD 2-Clause "Simplified" License
24 stars 5 forks source link

Creation of breakpoints file (in either GFF3 or FASTA format) #22

Open moldach opened 4 years ago

moldach commented 4 years ago

I would like to use BreakSeq2 but am unsure of how to create a breakpoint library for other model organisms; specifically C. elegans.

It's unclear from the documentation how a breakpoint library (either as FASTA or GFF) was created for humans.

I have used {MindTheGap} which creates a file.breakpoints file of detected insertion sites.

The file looks like this:

[moldach@cdr767 BreakSeq2]$ head $BREAKPOINTS
>bkpt3_I_pos_221157_fuzzy_3_HOM  left_kmer
TGAAATTGCCATTTCGACTGTGGCAGAGCCC
>bkpt3_I_pos_221157_fuzzy_3_HOM REPEATED right_kmer
ACGAAGAGCGTCGTGGATTCGGTGAGCTTCT
>bkpt4_I_pos_232103_fuzzy_4_HET  left_kmer
CGGGCCATTTGGGTCGCGGCCGGTCTGGGGG
>bkpt4_I_pos_232103_fuzzy_4_HET  right_kmer
GCTGGGCCCGTACTTCCTGGGAAGTTGAGAA
>bkpt6_I_pos_256855_fuzzy_0_HOM  left_kmer
AATTTTCATCTGAAAATTTAGTACTGAAATC

Looking at the .gff Breakpoints Library for humans looks much different:

[moldach@cdr767 BreakSeq2]$ head breakseq2_bplib_20150129.gff
1       1KG_Phase1      Deletion        766594  769112  .       .       .
1       1KG_Phase1      Deletion        776770  791881  .       .       .
1       1KG_Phase1      Deletion        869385  870317  .       .       .
1       1KG_Phase1      Deletion        912049  913594  .       .       .
1       1KG_Phase1      Deletion        947122  948001  .       .       .
1       1KG_Phase1      Deletion        1086818 1087023 .       .       .
1       1KG_Phase1      Deletion        1142720 1143140 .       .       .
1       1KG_Phase1      Deletion        1443564 1445764 .       .       .
1       1KG_Phase1      Deletion        1465912 1466230 .       .       .
1       1KG_Phase1      Deletion        1598414 1598580 .       .       .

Can the breakpoint information from MindTheGap be converted to a format that will work with breakseq2? If not, which tool(s) can be used to generate this information?