daynefiler / tcpl

Former repository for the tcpl R package project. New repo:
https://github.com/USEPA/CompTox-ToxCast-tcpl
4 stars 5 forks source link

Getting started #18

Closed ldecicco-USGS closed 7 years ago

ldecicco-USGS commented 7 years ago

First off.....my goal is to use this package to get dose response curves for toxCast data. For example, I'd like to re-create this plot in R: https://actor.epa.gov/dashboard/#selected/91-20-3+APR_HepG2_CellLoss_72h_dn+6701677

I think that seems like a direct product of this package (scanning the vignette, maybe tcplPlotM4ID ?), but if not...please let me know.

In a nutshell....I'd like to see a "Quick Guide to setting up MySQL database" or something like that.

I'm struggling to set up the MySQL database and get it hooked up with the package.

I'll try to log what I've done, and maybe something will stand out as what I should be doing:

  1. Install tcpl

  2. Download the data from the "Download MySQL Database" link from here: https://www.epa.gov/chemical-research/toxicity-forecaster-toxcasttm-data (actual download link: https://drive.google.com/drive/folders/0B0FnkxPTkDTIVVJOUGdmNzZtSTQ). This gets me a file "INVITRODB_V2_MYSQL.zip"

  3. I moved the file "INVITRODB_V2_MYSQL.zip" to a folder I called "toxCast"

  4. Unzipped "INVITRODB_V2_MYSQL.zip"

  5. Unzipped "prod_external_invitrodb_v2.all.zip"

  6. I'm using the very brand new RStudio, which has a "Connection" tab...I tried playing with that using the 32-bit version of R (is that needed? it is for Oracle databases....I'm not sure it's the same with MySQL...or even if that's very old info) I did not see an option for MySQL, so I scrapped that (http://db.rstudio.com/rstudio/connections)

  7. I navigated to "C:\Users\ldecicco\Documents\R\win-library\3.4\tcpl\TCPL.config" hoping to just change the TCPL_DRVR to "MySQL", and the TCPL_DB to the path to "prod_external_invitrodb_v2.all.sql"...but that didn't work

  8. Then I tried to follow ?tcplConf by doing:

    tcplConf(drvr = "MySQL", 
         user = "root",
         pass = "",
         host = "localhost",
         db = "D:/LADData/toxCast/INVITRODB_V2_MYSQL/prod_external_invitrodb_v2.all.sql")
    Error in .local(drv, ...) : 
    Failed to connect to database: Error: Can't connect to MySQL server on 'localhost' (0)

    So, my suspicion is I need something like MySQL workbench to be running?

  9. I downloaded and installed MySQL Workbench

  10. Clicked on "Local instance MySQL56", in the Navigator, ran the "Startup" under "Instance" to get the server to start

  11. Tried to follow: https://dev.mysql.com/doc/workbench/en/wb-getting-started-tutorial-create-connection.html This is inevitably where I get stuck....and I'm not even sure I'm on the right path.

So, am I on the right path? Can you steer me in the right direction to get it connected?

daynefiler commented 7 years ago

Hello @ldecicco-USGS . I appreciate your interest in tcpl. You are correct: what you need is a greater understanding of how to setup and utilize MySQL databases. Briefly, all you have done is downloaded a file that contains the information for building a database. You must also install MySQL, and manage the MySQL server that will hold the database. You will have to create an empty database, then load the file you unzipped into the empty database, then access the database through your MySQL server (which is what the tcplConf functions help you do in R). As an aside, it is easier to modify the config file using the tcplConfSave function. See ?tcplConf for more information.

Working with MySQL databases is pretty standard in terms of bioinformatics, and there are many great guides online that should be able to help you get started.

ldecicco-USGS commented 7 years ago

I don't suppose you have a favorite guide? I've tried using this package on and off for over a year now and always just get stuck here and just go back to using the actor dashboard.

martino165 commented 7 years ago

https://www.youtube.com/watch?v=OcoIqgnHG54

this video supplies a decent review of how to import (last 1/3rd of video) in mysql workbench. Essentially you have the .sql dump file and just need to create an empty database ("schema"). We generally call the database invitrodb or toxcastdb but it really does not matter and then you select the .sql file and start the import. This will take a good while as it is fairly large.

Once you are done that you should be able to set your db = "toxcastdb" or "invitrodb" within tcplConf (no need for the file location).

Hope this is helpful. Matt

On Mon, Oct 9, 2017 at 4:05 PM, Laura DeCicco notifications@github.com wrote:

I don't suppose you have a favorite guide? I've tried using this package on and off for over a year now and always just get stuck here and just go back to using the actor dashboard.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/daynefiler/tcpl/issues/18#issuecomment-335271072, or mute the thread https://github.com/notifications/unsubscribe-auth/AT2vwiW5aq4C1p-RHq7MXGV3oclvE0ldks5sqnx9gaJpZM4Py9Zq .

ldecicco-USGS commented 7 years ago

Ah HA! Thank you! I had tried using the restore data option, but didn't think about just creating an empty database first. It seems to be running now (like you said....it's slow). Mentioning that "you have the .sql dump file and just need to create an empty database ("schema")" was definitely enough to get a much much better search result on Google.