Reproduce the Enformer's input sequences split

calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.

Apache License 2.0

409 stars 126 forks source link

I would like to regenerate the input sequences for Enformer/Basenji2 (using basenji_data.py), and for this purpose, I am using the following command line:

python basenji_data.py -g hg38.gaps.bed -u umap_k36_t10_l32_hg38.bed -b hg38.blacklist.rep.bed -l 131072 -crop_bp 8192 -break_t 786432 -s 65599 -t .1 -v .1 -w 128 -o data/input_mseqs -p 8 targets.txt

However, I am observing differences when compared to the sequences.bed file stored here

Can you please confirm if I am using the right options to generate the same sequence split?

calico / basenji

Reproduce the Enformer's input sequences split #190