Closed mp15 closed 3 years ago
Hi Martin!
We don't have one yet, but we can certainly add a BEDPE parser. Do you have a specific format in mind in which you'd like it exposed? Are you essentially envisioning an encoder based on the BEDPE entry instead of a Variant entry that you can use to build the encoding?
Yes please. This format, I also have an example. I want to be able to take breakpoints from the BEDPE file and then classify either them or groups of them (I'm planning to feed groups of breakpoints to a RNN).
Fantastic, the example is very helpful. Looks quite straightforward, we can have a PR up within a few hours!
Hi Martin, I've created PR #145 with a generic BED parser, and used your sample data as a test case :). The API usage is shown here - https://github.com/clara-parabricks/VariantWorks/pull/145/files#diff-1beff6bb5395f5d2aa83d24579f5fb764dd0091a4bab06ee9fbba585e6f9a442R24
Can you have a look to see if this would fit your needs?
Thanks.
Sorry was dealing with my PhD first year viva. I have tested this and have some notes. #152 contains a bugfix for a trivial bug I spotted in the strong typing code.
Unfortunately the data produced by our BRASS pipeline uses a slightly different header format and unfortunately duplicates two of the header labels. I managed to find an example of this as well at https://github.com/cancerit/BRASS/blob/dev/perl/testData/BrassMarkedGroups_test.out.bedpe I have proposed a patch #153.
Hi @mp15 - looking through the column names in the example bedpe, I don't see a duplicated column name. Was that the right link?
tmp.txt Here is a better sample, this is an actual header from one of our pipelines (edited to remove sensitive bits). As you can see we have two strand1 and strand2 fields, eww.
Resolved by #153
Hi,
I am working on a structural variant breakpoint cluster classifier for cancer samples and was wondering whether you had a BEDPE format parser in the offing or whether you planned to write one in the near future?
Martin