Closed tracykteal closed 9 years ago
because people can't save commands
Commands can be saved using Views, which is the standard way to save queries in SQL. Someone typically asks about this in the first hour and I show them, but if it isn't in the notes it should be.
you can't zoom, when you teach the font is tiny and hard to read on the screen.
The way to do fix this is to open the SQLite Manager, click on the Settings button, and chose Start SQLite Manager in a new tab
. You can then use Ctrl-+
to zoom just like any other web page.
These are both tricks of the trade that should have been documented. Sorry.
An alternative when we're teaching R already is to instead use the RSQLite package and teach SQL through R. This solves a lot of those problems and also works well with a learners workflow. @lwasser already has a lesson that includes it https://github.com/datacarpentry/R-ecology/blob/gh-pages/05-r-and-sql.Rmd and when she taught it, it went well. @karthik has also taught this way and suggested this approach in conversation.
The problem that I see with this is that it really requires teaching R first (my understanding of 05-r-and-sql.Rmd is that it's supposed to be taught after both R and SQL have already been taught). Otherwise there is a lot more cognitive load in learning the SQL. So, unless we're planning on restructuring the order of material this seems like a pretty difficult route to go for most students. We would also need to maintain separate sets of material for R and Python at that point.
Personally I think there's still too much extra cognitive load in teaching SQL in R even when the students know R already. But, I do think that tying things together at the end by showing how to combine the tools they have learned over the two days is really powerful. I did a capstone live coding session at the end of my last SWC workshop using our main ecology dataset that did this (in Python, not R) and it went over really well. We could certainly incorporate something like that for the last hour of each workshop.
Of course I'm happy to be convinced otherwise, it just doesn't seem like it would work well to be at a first glance.
Personally I think there's still too much extra cognitive load in teaching SQL in R even when the students know R already.
I'd have to agree. Also, with SQL being s good data management practice we teach, I'd find it unfortunate if by combining it with R from the beginning, we end conveying the notion that you can use SQL and databases only from within a programming language.
just a note about this. i have taught SQL a few times now and i prefer to BEGIN to teach DB basics using a DB tool I understand why we use SQLite but i don't think it's the best applied tool to teach ecologists. But it is simple and good to teach principles. And generally i do show views. Views are a nice precusor to stored procedures and other things you might tap into in a true DB environment. When we teach SQLite the biggest question is always "how do we actually use this in practice". If DB is new to folks - it isn't clear just using SQL lite.
That is where R or Python come in. by demonstrating how quickly you can access the DB using a few lines of code, the full circle is connected - how data management connects to actual data analysis / munging, etc.
In python it's particularly efficient given the connection w pandas.
If folks are teaching SQL, i'd highly recommend starting with sql and DB principles- IN a DB tool. Demonstrate tables, talk about data formats, etc. All very important stuff. People will still wonder how it's useful but they'll have the base concepts down. Then when you followup with the "how to connect through R" - lightbulbs will click and teh connection betwen the DB envt and a tool like R becomes more clear. Atleast that's been my experience. :) l
I agree that it is really nice to teach SQL in the Firefox SQLite Manager. But I do think we're missing that step for how people use it in their real work flow. Should they import their CSV files in to SQLite and do their database work there? Should they install Postgres or MySQL and do things at the command line? The 'what to do next' might be very intuitive for people who use SQL a lot, but for instructors who only really use SQL lightly, like I do, I don't find myself being very good at answering those questions.
I like @lwasser idea of after getting through the SQL lesson, and through R, then we can bring them all together. going in to R and seeing how they can use SQL there. This could potentially even be the capstone, so they don't do it themselves, but the instructor goes through how to bring everything together. 1. creating an SQL database from some CSV files 2. importing that database in to R 3. doing some SQL queries in R to create a data frame 4. use that data frame for a minimal analysis and plotting 5. knit that whole R file. And then makes that script available to the learners so they can refer back to it for their work.
Also, I have to say, I just taught this lesson for SWC today, and overall I really love it! Thanks all the contributors!
I take all your points. However, I would still stay with teaching SQL with the Firefox plugin. My experience from the workshops for audiences who have no or very little programming experience was that they welcomed this module with relief. R is an extremely steep learning curve for them. They interface of the plugin is friendly and looks like Excel which makes them feel comfortable yet they see the power of this. Also, I would not undermine their skills in terms of "How do I apply it to my stuff". We had feedback from the workshop in Utrecht where 2 of the participants almost immediately moved from analysing and reporting in Excel into SQLite (they had a macro which didn't work and turns out they could do stuff in SQLite).
I suggest that we have R&SQL lesson and if time permits, instructors bring it in. Also, the tips which @ethanwhite mentions should be documented for the instructors to ease the way of teaching.
These are all great points. postgres is the tool i'd recommend people really using in a workflow as it's much more robust. BUT sqlite is simple as pointed out. i can't imagine teaching postgres in an hour or two. it would be a few day endeavor or atleast a solid day. And the setup is also more complicated.
To make the connection in terms of workflow - what if we gave folks the code to connect to a DB and showed them how it works first - maybe very early on... then talk about how we will make sense of the connection and how it works, when we teach sql? just a thought. I agree @tracykteal, the workflow piece is definitely the piece that could be strengthened.
@lwasser Postgres is certainly more robust in terms of enforcing data-types (as almost every other engine is) but that can prove a hindrance when people deal with "messy" or just badly curated data. For example, CSV files (produced by certain spreadsheets) where strings appear as numbers (or vice versa). SQLite with its "type values not columns" philosophy deals quite easily with this, allowing the user to treat the data as they want to treat it in the query. Combined with the way SQLite structures its physical storage (once closed, it's a single file making traditional file-based sharing easy), makes it to my mind, the ideal beginners database.
Obviously there's a downsides to this: file-sharing is okay when everybody else is just a reader, but as soon as you're into multiple manipulators of the data it's absolutely dreadful. It's also the only database I'm aware of with such a laissez-faire attitude toward types which probably leads to a certain amount of confusion/pain when someone transitions to a more traditional database for the first time.
Personally, I lament that Firebird (née Interbase) isn't more popular. Like SQLite it uses a single file for physical storage (although it's very different under the covers), but like Postgres (and everybody else) it's strict about typing. It's also ridiculously fast and has an embedded (single-user) version, and a server version. Sadly, it's documentation is also sub-par, and (probably due to its lack of popularity) the tooling is also distinctly lacking, so I can't really recommend it as an alternative.
If we want to introduce people to alternative engines, I would recommend a first step (assuming the workshops have adequate internet access) would be to introduce people to sqlfiddle.com. It's an extremely useful site frequently used on the database administrator's stack exchange for building minimal example cases. It's free and allows users to experiment with schemas and queries in a variety of engines; at the time of writing:
I did contact the author offering to help add DB2 to that list (back when I was working for IBM) but sadly that never came to fruition. I might ask if he's willing to add Firebird though! Still, it's an excellent resource if you want to build something in SQLite, then switch the engine to PostgreSQL and see what breaks :)
With @ethanwhite tips on how to better use it as a teaching tool, SQLite does seem to be best for install/teaching. I think the missing lesson is 1) emphasizing that this can be used for their own work (just import in their own csv files) 2) what the potential limitations might be 3) what other database options might be and some of their strengths/weaknesses. I get asked these things almost every workshop, and by now (after a series of Googles) I know the answers, but we can't expect every instructor to have the answers handy,
I propose closing this issue and migrating over to an issue on adding a 'Using SQL with your own data' module that would include this type of information. https://github.com/datacarpentry/sql-ecology/issues/45
I propose closing this issue and migrating over to an issue on adding a 'Using SQL with your own data' module that would include this type of information. #45
Agreed. Thanks for starting a really important discussion @tracykteal.
Closing. For further discussion see #36 and #45.
We currently have people install the Firefox plugin to use SQL. It's nice because it tends to work under all OSes, but there are some challenges because people can't save commands, they don't quite understand how they'll use it after a workshop, and because you can't zoom, when you teach the font is tiny and hard to read on the screen.
An alternative when we're teaching R already is to instead use the RSQLite package and teach SQL through R. This solves a lot of those problems and also works well with a learners workflow. @lwasser already has a lesson that includes it https://github.com/datacarpentry/R-ecology/blob/gh-pages/05-r-and-sql.Rmd and when she taught it, it went well. @karthik has also taught this way and suggested this approach in conversation.