calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
411 stars 126 forks source link

How to prepare TFRecords files for saluki training #164

Closed hamanakakohei closed 1 year ago

hamanakakohei commented 1 year ago

Thank you for publishing these great resources. "basenji/jupyter/saluki_data.ipynb" demonstrates how to prepare TFRecords files for saluki. The TFRecords files in the ipynb file contain "lengths", ""utr5", "cds", "utr3", "features", and "targets" features. But, actual TFRecords files used for saluki training (https://zenodo.org/record/6326409#.ZGrDWXZBxD8) include "lengths", ""sequence", "coding", "splice", and "targets" features. Do you plan to make it public how you prepared these TFRecords files?

davek44 commented 1 year ago

Hi, that notebook represents a prior version of the tfrecord generation workflow. I replaced it with our final version in the latest push to the master branch.