Closed cmdcolin closed 1 year ago
random note: possible gene fusion data from TCGA here, controlled access though cc @carolinebridge-oicr https://portal.gdc.cancer.gov/repository?filters=%7B%22op%22%3A%22and%22%2C%22content%22%3A%5B%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22files.data_type%22%2C%22value%22%3A%5B%22Transcript%20Fusion%22%5D%7D%7D%5D%7D
just commenting here cause gene fusion discussion with @rbuels earlier too and cause the data format for these is bedpe
BEDPE data is a simple format that shows interactions between two regions of the genome, and sometimes breakend data can come in this format
Small data files could be parsed in their entirety
Larger files might need a 2D index scheme (e.g. pairix).
Many applications end up with this type of data and/or a subtype of it
https://github.com/cancerit/BRASS/wiki/BEDPE https://support.10xgenomics.com/genome-exome/software/pipelines/latest/output/bedpe https://docs.higlass.io/data_preparation.html