1edv / evolution

This repository contains the code for our manuscript - 'The evolution, evolvability, and engineering gene regulatory DNA'
MIT License
93 stars 27 forks source link

Replicating the data splits #8

Closed aga-relation closed 2 years ago

aga-relation commented 2 years ago

Hi, I am trying to replicate the data splits quoted in the paper and having issues.

There is no seed in the data processing script hence no way to replicate your train/valid/test splits. Instead, I am trying to connect the files mentioned in the processing script with the files in the data repository but there are no matching file names and the README doesn't explain it either. Could you please specify which data files were used for training, validation, evaluating on random sequences and evaluating on naturally-occurring sequences?

Thank you! :)

1edv commented 2 years ago

Hi,

Thank you for your question.

The files with matching names can be accessed from the CodeOcean capsule we shared with our publication: https://codeocean.com/capsule/8020974/tree/v1 . For instance, /data/Glu/_teX.h5 corresponds to the complex media training data, and /data/Glu/_vaX.h5 corresponds to the complex media validation data.

The full data (used to generate these splits) and high quality random test data can also be accessed in the data repository here:

Good luck!

aga-relation commented 2 years ago

Great, thank you! Could you please clarify which value in the Random_testdata_defined_media.csv corresponds to gene expression?

1edv commented 2 years ago

The meanEL column corresponds to expression.