Cleaning `data` and `examples` directories

choderalab / assaytools

Modeling and Bayesian analysis of fluorescence and absorbance assays.

http://assaytools.readthedocs.org

GNU Lesser General Public License v2.1

18 stars 11 forks source link

Cleaning `data` and `examples` directories #66

Closed sonyahanson closed 7 years ago

sonyahanson commented 7 years ago

Cleaning data and examples directories.

sonyahanson commented 7 years ago

Making changes as suggested in #55 .

sonyahanson commented 7 years ago

Add simulated data and MLE notebooks to new probe-assay folder.

sonyahanson commented 7 years ago

Had a quick meeting just to catch up with Greg and Mehtap today, and we decided it makes sense to call the folders in examples:

direct-fluorescence-assay
competition-fluorescence-assay

We should try to keep this nomenclature consistent, there are some things in the autoprotocol directory that use the term probe-assay, which we should remember to update.

sonyahanson commented 7 years ago

Now have a basic competition assay analysis of simulated data with MLE using the three-component-binding function: https://github.com/choderalab/assaytools/blob/data_clean/examples/competition-fluorescence-assay/MLE%20fit%20for%20three%20component%20binding%20-%20simulated%20data.ipynb

jchodera commented 7 years ago

I am pretty sure the assumptions made in that three-component binding model are not going to work for us.

Can you just substitute that out with the general binding model?

If the API is too hard to use, I can expose a simpler three-component binding model for you to use as a drop-in replacement.

sonyahanson commented 7 years ago

Sure, this notebook was somewhat of an exercise, so yeah, don't know that it's entirely suitable for actual data analysis. But yeah, Greg and I are planning working through the general binding model in a similar way, and we can compare performance.

sonyahanson commented 7 years ago

Finished cleaning all the data and extraneous ipynbs!

There's definitely still some work to do cleaning up the ipynb's, but the main goal of this PR is just to clean up all the crap so the structure of the examples and data is roughly how we want it.

If anyone sees any specific data or ipynb's that they think should stick around that are not here, let me know. Otherwise this is good to merge.

sonyahanson commented 7 years ago

@jchodera @MehtapIsik any thoughts?

MehtapIsik commented 7 years ago

I am very happy with direct-fluorescence-assay and competition-fluorescence-assay distinction. I think this organization of example data is much better than before.

I have three small suggestions:

Names of the xml data files are confusing. Can we change them all to a "[protein name] [fluorescent probe] [non-fluorescent ligand(if exists)] _ [experiment date].xml" format?
The ipython notebook classification is 1 for simulations, 2 for LME fit, and 3 for bayesian fit example, right? 2a-competition-assay-modeling.ipynb does not fit this order. I think we should label it 1c instead.
For both direct and competition binding directories, do you think it will be useful to add "4 quick model with simple bayesian analysis.md" file with instructions how to use quickmodel.py with for each case with an example?

jchodera commented 7 years ago

Looking good!

My preference is that, instead of having a single top-level README.md like this one to have a README.md in each directory that explains just the current contents of that directory. That makes the github easier to navigate, I think!

sonyahanson commented 7 years ago

Thanks Mehtap, these are great suggestions!

I've made the change suggested in 2, and made 1 and 3 issues for later. I had meant to include instructions for using quickmodel in this PR, but didn't end up having time and I like your idea of connecting it to an ipynb as a 4th segment, though perhaps it should just replace the 3rd notebook...

sonyahanson commented 7 years ago

Merging.