Open danielparton opened 9 years ago
This is pretty awesome!
No problem regarding the $80 Dryad fee. Worth a try.
Some comments:
README.md
title renders a bit oddly because there is so much text packed in it. Maybe just change the title to "Supplementary data" and give the citation info below?README.md
had a step-by-step explanation of the contents of commands.sh
. You have an explanation like this in the manuscript you can just lift and format.csv
files so nicely. We may not even need the txt
versions of your csv
files since they are already human-readable through GitHub, though I can't see any harm in leaving them in.models-data.csv
and topology.pdb
, e.g.:
uscripts/tree/master/dataset-for-publication/models/ABL1_HUMAN_D0We should also make sure @sonyahanson, @pgrinaway, and @kyleabeauchamp take a look!
If the ensembler/supporting-info/
directory is deprecated, you can git rm
it.
"If the ensembler/supporting-info/ directory is deprecated, you can git rm it." Done.
For commands.sh
, maybe we want to also include a little bit of code that creates a conda environment and uses the exact version of ensembler needed to generate the data in the paper?
I think that would be something like this:
conda create -c https://conda.binstar.org/omnia -p ~/anaconda/envs/ensembler python=2.7 ensembler=0.2 --yes
where the ensembler
release version would replace the 0.2
.
You might also need conda activate ensembler
after that.
"* I hadn't realized GitHub renders csv files so nicely. We may not even need the txt versions of your csv files since they are already human-readable through GitHub, though I can't see any harm in leaving them in.
The model XTC trajectories are about 60-80 MB for each target, totaling 6.1 GB. The max size for a GitHub repo is 1 GB, hence I did not add these to the repo.
So right now I'm thinking the main way to access the dataset would be to download a zip or tgz archive from Dryad. This is why I included the .txt table versions of the csv files in the dataset.
The model XTC trajectories are about 60-80 MB for each target, totaling 6.1 GB. The max size for a GitHub repo is 1 GB, hence I did not add these to the repo.
Got it. I hadn't realized that you had just omitted these---makes sense!
Note 1GB is the maximum recommended size. I think it just becomes crazy to work with after that. GitHub also doesn't like >50MB files---that might be the harder limit.
So right now I'm thinking the main way to access the dataset would be to download a zip or tgz archive from Dryad. This is why I included the .txt table versions of the csv files in the dataset.
Sounds good.
Ok, I've added an explanation of command.sh in the README.
Thanks! Let me make a few edits to the README.
Actually, I'm still trying to check out the repo. It seems to have exploded in size...
OK, I've made my edits in a PR: https://github.com/choderalab/ensembler-manuscripts/pull/39
I was mostly worried the existing text, although good, was backwards. The command was listed and then its purpose was stated afterwards. Instead, I moved the explanation to precede the command and added section subheadings for each step. Feel free to edit as appropriate!
Here's a preview of the edited README: https://github.com/jchodera/ms-ensembler/blob/update-dataset-README/dataset-for-publication/README.md
We may still want to add this line to the README.md
and commands.sh
:
conda create -c https://conda.binstar.org/omnia -p ~/anaconda/envs/ensembler python=2.7 ensembler=0.2 --yes
modified to match whatever release version you cut of ensembler
to correspond with the paper.
It will also be good post a link to the dataset and bioRxiv manuscript on our choderalab data page when this is up on Dryad!
This is the dataset: https://github.com/choderalab/ensembler-manuscripts/tree/master/dataset-for-publication The README should explain everything.
To make this available, the plan is to put the contents of that directory into a tar archive (6.2 GB) and upload to Dryad Digital Repository (http://datadryad.org/).
Note that there will likely be an $80 one-off fee at the time of article acceptance: This is described in the Dryad FAQ: http://datadryad.org/pages/faq#deposit Is this ok?
And are we happy with this dataset? I think it should cover everything needed, but let me know if you can think of anything that should be added or modified.