Database: download a database

Mclilzee commented 2 months ago

Checks

[X] This is not a duplicate of an existing issue (please have a look through our open issues list to make sure)
[X] I have thoroughly read and understand The Odin Project Contributing Guide
[X] Would you like to work on this issue?

Describe your suggestion

Let me first tell you a little about my experience the first time I learned about Databases.

I was doing CS50 and the Databases got introduced, in part of the introduction we were instructed to download a SQL file of IMDb which included all the shows, movies, actors tables. After doing so I was able to query every movie that is known inside of IMDb offline on my computer, and it felt so powerful. This made me really appreciate databases, how fast they are and how efficient they are at storing data.

After going through our Database lessons, I felt this kind of magic is missing. We only talk in words about databases and how to work with them. I felt that we should bring this kind of magic and appreciation of Database to make people see how powerful they really are by introducing the concept I talked about above. Another key feature is missing, is the indexing which had a very brief introduction, I'm not sure if it gets brought up again I'm only 30% of Node now.

Getting to see a database going from 200ms search time to 0.1ms by indexing it also will bring more light into why indexing is important for the fields you want to search. Another thing that is not mentioned in previous lessons is that indexing takes resources of memory, so it is not entirely free to do.

I looked at Kaggle.com and the only meaningful decently sized file (1 GB) in SQL format was an NBA matches and players. IMDb is available for free in TSV (Tab separated values) format, I can write a program to parse it into SQL and then instruct people to download that file, otherwise we would need to find a file that is as big. Then we can add an assignment to download it and mess around with it, or have a new section instructing people to install sqlite and let them work with this database.

It could also be a bonus assignment if people not interested to download any files and exploring databases on their own. I would like to know your thoughts on that.

Path

Ruby / Rails, Node / JS

Lesson Url

https://www.theodinproject.com/paths/full-stack-javascript/courses/databases

(Optional) Discord Name

mclilzee

(Optional) Additional Comments

No response

Mclilzee commented 2 months ago

Updae to this, I don't think we are able to use the IMDb data directly by pre-parsing it for the users and uploading it. As altering the data for reasons other than personal use is against the Non-Commersial license. If we are to go with this implementation, then we need to find another data set or instruct the users to parse the data on their environment for personal use.

I have already built a parser that you can try it out by downloading the pre-compiled executable, MacOS still missing however. https://github.com/Mclilzee/imdb-to-sqlite

Asartea commented 2 months ago

There are some fun databases listed on https://wiki.postgresql.org/wiki/Sample_Databases which (mostly) seem to have compatible license terms. Alternatively, we could always pull something from a government with CC-0 licensing (such as the US government's data.gov, which lists various datasets about the U.S. At least the federal datasets included in that should be (mostly) public domain)

Mclilzee commented 2 months ago

Great suggestions! We can surely find something to work with if the idea as a whole was accepted. I haven't explored how easy it is to import data in Postgres, SQLite is just a file you download. We can look into that too, maybe as part of Postgres installation lesson

CouchofTomato commented 2 months ago

I definitely like this idea. Would enhance the course.

Asartea commented 2 months ago

What do we think of https://github.com/lerocha/chinook-database? Its reasonable complex but not too complex, freely available, and provides the entire thing as .sql files

CouchofTomato commented 2 months ago

It looks good to me.

Mclilzee commented 2 months ago

It wasn't clear for me a what the chinook database is, There was a part where it said that it is real data gathered from iTunes and the other talk about auto generating names addresses etc. If it's real data, then this would fit perfectly in what I have in mind. If it is just auto generated data then not so much, I haven't explored it yet, will do later definitely.

So the idea is, a real database of real data that is popular if the above is real data from musics then that would fit perfectly because people will be able to look up their fav artist, album names, writer and so on. That is what I'm intending, a real database that people can query for real information that sparks the "wow I can find the info I want offline on my pc" moment for them. It also needs to be decently large, so people don't keep hitting dead ends for the stuff they are interested looking for.

Asartea commented 2 months ago

It wasn't clear for me a what the chinook database is, There was a part where it said that it is real data gathered from iTunes and the other talk about auto generating names addresses etc. If it's real data, then this would fit perfectly in what I have in mind. If it is just auto generated data then not so much, I haven't explored it yet, will do later definitely.

I believe its a database for a fictional webshop selling music. All of the music parts (tracks, albums, artist, genres) are real and taken from iTunes, but the shop part (customers, orders, etc) is autogenerated.

Asartea commented 2 months ago

Just threw this together using the MySQL version (I happen to have MySQL workbench installed with a copy of it)

SELECT Track.Name as Track, Album.Title as Album, Artist.Name as Artist, MediaType.Name as Type, Genre.Name as Genre FROM Track
JOIN Album on Track.AlbumId = Album.AlbumId
JOIN MediaType on Track.MediaTypeId = MediaType.MediaTypeId
JOIN Genre on Track.GenreId = Genre.GenreId
JOIN Artist on Album.ArtistId = Artist.ArtistId
LIMIT 20

Output (outputted to a .csv file):

Track,Album,Artist,Type,Genre
"For Those About To Rock (We Salute You)","For Those About To Rock We Salute You",AC/DC,"MPEG audio file",Rock
"Put The Finger On You","For Those About To Rock We Salute You",AC/DC,"MPEG audio file",Rock
"Let's Get It Up","For Those About To Rock We Salute You",AC/DC,"MPEG audio file",Rock
"Inject The Venom","For Those About To Rock We Salute You",AC/DC,"MPEG audio file",Rock
Snowballed,"For Those About To Rock We Salute You",AC/DC,"MPEG audio file",Rock
"Evil Walks","For Those About To Rock We Salute You",AC/DC,"MPEG audio file",Rock
C.O.D.,"For Those About To Rock We Salute You",AC/DC,"MPEG audio file",Rock
"Breaking The Rules","For Those About To Rock We Salute You",AC/DC,"MPEG audio file",Rock
"Night Of The Long Knives","For Those About To Rock We Salute You",AC/DC,"MPEG audio file",Rock
Spellbound,"For Those About To Rock We Salute You",AC/DC,"MPEG audio file",Rock
"Balls to the Wall","Balls to the Wall",Accept,"Protected AAC audio file",Rock
"Fast As a Shark","Restless and Wild",Accept,"Protected AAC audio file",Rock
"Restless and Wild","Restless and Wild",Accept,"Protected AAC audio file",Rock
"Princess of the Dawn","Restless and Wild",Accept,"Protected AAC audio file",Rock
"Go Down","Let There Be Rock",AC/DC,"MPEG audio file",Rock
"Dog Eat Dog","Let There Be Rock",AC/DC,"MPEG audio file",Rock
"Let There Be Rock","Let There Be Rock",AC/DC,"MPEG audio file",Rock
"Bad Boy Boogie","Let There Be Rock",AC/DC,"MPEG audio file",Rock
"Problem Child","Let There Be Rock",AC/DC,"MPEG audio file",Rock
Overdose,"Let There Be Rock",AC/DC,"MPEG audio file",Rock

Mclilzee commented 2 months ago

Ok this is perfect then! Everyone will surely enjoy having their offline little music lookup database where they can query as a lesson for practice. If it is easy to import into Postgres, maybe we can include it in one of the setting up Postgres lessons and move that lesson to earlier in Node.

Asartea commented 2 months ago

it is easy to import into Postgres,

Its really easy. I just followed these steps:

Download the version you want from the release page (I used the one with autoincrementing primary keys)
Find where you downloaded the file
Run psql -U <username> -f <path_to_file>/<file_name>.sql
Wait until its done
Login to psql
Run \c <database_name> (shown earlier in the output) to use the new database
You can now just query it

(Admittedly: it took me quite a bit because I am stuck in my MySQL ways, so I first spend time debugging why I couldn't use the USE <dbname> command (postgres doesn't have it), and then more debugging the fact that postgres uses snake_case instead of camelCase, but after that it worked like a charm)

github-actions[bot] commented 1 month ago

This issue is stale because it has had no activity for the last 30 days.

TheOdinProject / curriculum