SheffieldML / notebook

Collection of jupyter notebooks for demonstrating software.
BSD 3-Clause "New" or "Revised" License
165 stars 107 forks source link

Accessing "staffwww.dcs.sheffield.ac.uk/people/J.Hensman" data #8

Open finmod opened 8 years ago

finmod commented 8 years ago

There is a common problem on accessing compbio and other datasets: drosophilia, spellman yeasts, Lab3.zip and others. This is in addition to migrating matplotlib and pods to Python 3. Should'nt these datasets be integrated nicely in pods to provide an homogeneous set of testing notebook (gprs, gpss) and "datasets" folder?

The error is: C:\Users\Denis\Anaconda3\lib\urllib\request.py in http_error_default(self, req, fp, code, msg, hdrs) 587 class HTTPDefaultErrorHandler(BaseHandler): 588 def http_error_default(self, req, fp, code, msg, hdrs): --> 589 raise HTTPError(req.full_url, code, msg, hdrs, fp) 590 591 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

mzwiessele commented 8 years ago

@jameshensman do you still have the files?

lawrennd commented 8 years ago

I tried to move most of these types of things across as I found them. Certainly spellman is in pods, but I'm not sure about drosophila.

It's a good example of why we developed pods!

If we can recover the datasets let's try and get them integrated.

On Sun, Mar 6, 2016 at 10:27 AM, Max Zwiessele notifications@github.com wrote:

@jameshensman https://github.com/jameshensman do you still have the files?

— Reply to this email directly or view it on GitHub https://github.com/SheffieldML/notebook/issues/8#issuecomment-192867094.

magnusrattray commented 8 years ago

Is there any news on the drosophila data?

finmod commented 8 years ago

No, I established that using pods is better than using GPy.utils to access the dataset files. This is with GPy-devel. All in all, I managed to put a complete folder "datasets" from various sources and packages in SheffieldML. Hence, I managed to form the drosophila.knirps file required by Hierarchical.ipynb and eliminate direct access to Lab3 in that notebook.

lawrennd commented 8 years ago

That's great. yes pods is the right place to do this.

Did you do a pull request for an updated version of the notebook?

On Tue, May 3, 2016 at 3:09 PM, finmod notifications@github.com wrote:

No, I established that using pods is better than using GPy.utils to access the dataset files. This is with GPy-devel. All in all, I managed to put a complete folder "datasets" from various sources and packages in SheffieldML. Hence, I managed to form the drosophila.knirps file required by Hierarchical.ipynb and eliminate direct access to Lab3 in that notebook.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/notebook/issues/8#issuecomment-216633879

jameshensman commented 8 years ago

Here's the drosophila data if someone wants to add it. dros.zip

finmod commented 8 years ago

Thank you James for this data file. With the kalinka09_mel.csv and kalinka09_mel_pdata.csv files extracted into the compbio folder, Hierarchical.ipynb is now running fine.

Note that kalinka09_mel is a lighter version than the one I downloaded from the original source using pods.

To recap the fix:

1) Extract the two kalinka09 files to the compbio folder;

2) Comment out urllib in hierarchical.ipynb as follows:

import urllib

urllib.urlretrieve('http://staffwww.dcs.sheffield.ac.uk/people/J.Hensman/data/kalinka09_mel.csv', 'kalinka_data.csv')

urllib.urlretrieve('http://staffwww.dcs.sheffield.ac.uk/people/J.Hensman/data/kalinka09_mel_pdata.csv', 'kalinka_pdata.csv')

expression = np.loadtxt('kalinka09_mel.csv', delimiter=',', usecols=range(1, 57))

gene_names = np.loadtxt('kalinka09_mel.csv', delimiter=',', usecols=[0], dtype=np.str)

replicates, times = np.loadtxt('kalinka09_mel_pdata.csv', delimiter=',').T

normalize data row-wise

expression -= expression.mean(1)[:,np.newaxis]

expression /= expression.std(1)[:,np.newaxis]

Running the complete (8 out of 8) compbio folder requires a similar availability of a data file for

Y=np.load("/users/suraalrashid/expression.npy") in TFA_with_Coregion-1.ipynb.

I could not locate the suraalrashid data anywhere.

From: James Hensman [mailto:notifications@github.com] Sent: Thursday, May 5, 2016 9:14 AM To: SheffieldML/notebook notebook@noreply.github.com Cc: finmod denis.richard@dr.com; Author author@noreply.github.com Subject: Re: [SheffieldML/notebook] Accessing "staffwww.dcs.sheffield.ac.uk/people/J.Hensman" data (#8)

Here's the drosophila data if someone wants to add it. dros.zip https://github.com/SheffieldML/notebook/files/250051/dros.zip

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/notebook/issues/8#issuecomment-217091337 https://github.com/notifications/beacon/AMHyIlwuQNLm8-kfUZU3-5vi0rDxy-UAks5p-ZitgaJpZM4G1oVA.gif

finmod commented 8 years ago

Hello James,

As a logical step after running hierarchical.ipynb, in deepGPy (configuration: Linux (Ubuntu) on VM VirtualBox, python 2.7 and Anaconda 2.5), two questions arise about plotting:

It would be nice if you could make available the code for these two plots because they convey a telling message for otherwise complex processes.

Thank you.

From: James Hensman [mailto:notifications@github.com] Sent: Thursday, May 5, 2016 9:14 AM To: SheffieldML/notebook notebook@noreply.github.com Cc: finmod denis.richard@dr.com; Author author@noreply.github.com Subject: Re: [SheffieldML/notebook] Accessing "staffwww.dcs.sheffield.ac.uk/people/J.Hensman" data (#8)

Here's the drosophila data if someone wants to add it. dros.zip https://github.com/SheffieldML/notebook/files/250051/dros.zip

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/notebook/issues/8#issuecomment-217091337 https://github.com/notifications/beacon/AMHyIlwuQNLm8-kfUZU3-5vi0rDxy-UAks5p-ZitgaJpZM4G1oVA.gif