OpenSourceMalaria / OSM_To_Do_List

Action Items in the Open Source Malaria Consortium
82 stars 13 forks source link

If we add more sheets to the Google Doc, does that screw up automatic dataset visualization? #312

Closed mattodd closed 9 years ago

mattodd commented 9 years ago

The Google Doc currently focusses on Series 4. I'd like to add three more sheets (as separate sheets) for Series 1 through 3. And then obviously add more sheets for future series.

Would this mess up any of the existing visualization/analysis tools, such as the one @drc007 just posted?

Is it possible in future to have visualization tools look at (i.e. can we link to)

i) individual sheets? ii) all sheets (i.e. all OSM compounds) together?

I'm asking with an eye on suitability of Google Docs long-term as the ultimate source of OSM compound data.

Related to #285

drc007 commented 9 years ago

I'd be tempted to just have them all in a single sheet, and have a column with series number. The tools that we link to it will almost certainly have the option to hide other series than the one you want to focus on.

wvanhoorn commented 9 years ago

I have a preference for single sheet as well.

Re suitability of gsheets for data storage: databases are the standard for storing screening data, why is one not used here? It would make my life easier since the tool I use cannot read directly from a gsheet so I have to cut and paste to update models, etc.

And one more for the wishlist: I know there is some hERG data but I can't find it? Could this be added to the gsheet (as well as any other data that might be there)?

drc007 commented 9 years ago

I've just written a python script to access the data, http://www.macinchem.org/reviews/vortex/tut26/scripting_vortex26.php you may be able to modify it.

mattodd commented 9 years ago

We need to use a repository of data that is editable by multiple people and is free. Google Doc works well from this perspective and can, I understand, be exported a number of different ways? Something you'd rather use?

hERG data is e.g. here: http://malaria.ourexperiment.org/biological_data/11081

Needs the Data Angel to come along and add to the sheet...

On 24 June 2015 at 23:42, Willem van Hoorn notifications@github.com wrote:

I have a preference for single sheet as well.

Re suitability of gsheets for data storage: databases are the standard for storing screening data, why is one not used here? It would make my life easier since the tool I use cannot read directly from a gsheet so I have to cut and paste to update models, etc.

And one more for the wishlist: I know there is some hERG data but I can't find it? Could this be added to the gsheet (as well as any other data that might be there)?

— Reply to this email directly or view it on GitHub https://github.com/OpenSourceMalaria/OSM_To_Do_List/issues/312#issuecomment-114871817 .

MATTHEW TODD | Associate Professor School of Chemistry | Faculty of Science

THE UNIVERSITY OF SYDNEY Rm 519, F11 | The University of Sydney | NSW | 2006 T +61 2 9351 2180 | F +61 2 9351 3329 | M +61 415 274104 E matthew.todd@sydney.edu.au | W http://sydney.edu.au/science/chemistry/research/todd.html | W http://opensourcemalaria.org/

CRICOS 00026A This email plus any attachments to it are confidential. Any unauthorised use is strictly prohibited. If you receive this email in error, please delete it and any attachments.

lpatiny commented 9 years ago

I would prefer indeed to have only 1 sheet. We can always filter out the data based on any value in a column later. We can also add a "select" drop down menu. Just give me the categories and the corresponding label.

We also provide a search box on the top.

You can put a number in "Series" to make a query "by text", so a compound could be part of many series.

If there are only numbers you may enter ">1"

You may also have regular expression /[A-Z].*/ for example.

You may also search for a range of values "3..5"

lpatiny commented 9 years ago

@wvanhoorn If you can curl "http://googledocs.cheminfo.org/spreadsheets/d/1Rvy6OiM291d1GN_cyT6eSw_C3lSuJ1jaR7AJa8hgGsc/export?format=tsv" you will get the data in a tab-delimited format. It should rather be useable with any scripting language I think

wvanhoorn commented 9 years ago

Thanks @drc007! I expected something complicated (I shouldn't have looked at the Google API documentation) while a simple solution is this link from your python script. I am sometimes good at not seeing a solution that is right in front of me... http://docs.google.com/spreadsheets/d/1Rvy6OiM291d1GN_cyT6eSw_C3lSuJ1jaR7AJa8hgGsc/export?format=tsv

wvanhoorn commented 9 years ago

@lpatiny And I am also good at not looking if there are newer posts while typing a reply...

lpatiny commented 9 years ago

As you may see I go through a proxy ""http://googledocs.cheminfo.org". the only reason is to allow cross-origin so you should use directly the docs.google.com if you access from python for example

mattodd commented 9 years ago

Going to keep the data in a single sheet, with a column for Series Number. Closing.