BIMSBbioinfo / janggu

Deep learning infrastructure for genomics
GNU General Public License v3.0
254 stars 33 forks source link

GenomicIndexer Functionality #32

Closed tfabiha closed 2 years ago

tfabiha commented 2 years ago

Hello,

I was wondering if there's an existing way for GenomicIndexer to be automatically generated based on genome size and stepsize? I want to be able to have intervals of size 1000 throughout the genome without having to specify a roi file, but I haven't been able to find anything in the documentation.

gaow commented 2 years ago

To echo @tfabiha, indeed it would be very helpful if the DeepSEA and DanQ tutorial can be updated to showing how to annotate every variant in the genome at intervals of 1kb without having to specify regions of interest, just like DeepSEA does. Thanks!

wkopp commented 2 years ago

It should be possible to do this by using a pandas dataframe to create a GenomicIndexer. Here is an example


import pandas as pd
from janggu.data import GenomicIndexer

df = pd.DataFrame({'chrom': ['chr1', 'chr2'], 'start': [0, 0], 'end': [10000, 2000]})

gi = GenomicIndexer.create_from_file(df, binsize=1000, stepsize=1000)