Week 2 - Linux & SQL - Githubissues

Here's my notes if anyone wants them

SQL:

What is SQL? stands for Structured Query Language. SQL is used to communicate with a database.

What are its sublanguages? DDL – Data Definition Language. DML – Data Manipulation Language. DRL/DQL – Data Retrieval Language/Data Query Language. TCL – Transaction Query Language. DCL – Data Control Language. SCL – Session Control Language.

What is multiplicity? he Multiplicity attribute of a relationship specifies the cardinality or number of instances of an EntityType that can be associated with the instances of another EntityType.

One-to-many Zero-or-one to one Zero-or-one to many

What is cardinality? the term cardinality refers to the uniqueness of data values contained in a particular column (attribute) of a database table. The lower the cardinality, the more duplicated elements in a column. Thus, a column with the lowest possible cardinality would have the same value for every row. SQL databases use cardinality to help determine the optimal query plan for a given query.

What is a candidate key? A candidate key is a set of attributes (or attribute) which uniquely identify the tuples in relation or table. As we know that Primary key is a minimal super key, so there is one and only one primary key in any relationship but there is more than one candidate key can take place. Candidate key’s attributes can contain a NULL value which opposes to the primary key.

PRIMARY KEY CANDIDATE KEY

Primary key is a minimal super key. So there is one and only one primary key in a relation. While in a relation there can be more than one candidate key.
Any attribute of Primary key can not contain NULL value. While in Candidate key any attribute can contain NULL value.
Primary key can be optional to specify any relation. But without candidate key there can’t be specified any relation.
Primary key specifies the important attribute for the relation. Candidate specifies the key which can qualify for primary key.
Its confirmed that a primary key is a candidate key. But Its confirmed that a candidate key can be a primary key.

What is referential integrity? Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute (either in the same or a different relation), then the referenced value must exist.

What are the different constraints of a column? SQL constraints are used to specify rules for the data in a table. Constraints are used to limit the type of data that can go into a table. This ensures the accuracy and reliability of the data in the table. If there is any violation between the constraint and the data action, the action is aborted. Constraints can be column level or table level. Column level constraints apply to a column, and table level constraints apply to the whole table.

NOT NULL - Ensures that a column cannot have a NULL value UNIQUE - Ensures that all values in a column are different PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a table FOREIGN KEY - Uniquely identifies a row/record in another table CHECK - Ensures that all values in a column satisfies a specific condition DEFAULT - Sets a default value for a column when no value is specified INDEX - Used to create and retrieve data from the database very quickly

what are the differences between GROUP BY and ORDER BY? The ORDER BY clause’s purpose is to sort the query result by specific columns. The GROUP BY clause’s purpose is summarize unique combinations of columns values.

What is the difference between IN and EXISTS? The main difference between IN and EXISTS in Oracle is that the SQL engine compares all values in the IN condition while the SQL engine stops the process as soon as finding a single positive condition in EXISTS.

What are subqueries? A Subquery or Inner query or a Nested query is a query within another SQL query and embedded within the WHERE clause.

A subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved.

What is the difference between an aggregate function and a scalar function?

A scalar function returns a single value. It might not even be related to tables in your database. A tabled-valued function returns your specified columns for rows in your table meeting your selection criteria. An aggregate-valued function returns a calculation across the rows of a table -- for example summing values.

What are the different joins in SQL?

Cross Joins Cross joins return all combinations of rows from each table.

Inner Joins Inner joins return rows when the join condition is met. This is the most common Database join.

Outer Joins Outer joins return all the rows from one table, and if the join condition is met, columns from the other.

There are three types of outer joins: Left, Right, and Full outer joins.

Left Outer Join – Return all rows from the “left” table, and matching rows from the “right” table.
If there are no matches in the right table, return Null values for those columns.

Right Outer Join – Return all rows from the “right” table, and matching rows from the “left” table.
If there are no matches in the left table, return Null values for those columns.

Full Join – Return all rows from an inner join, when no match is found, return nulls for that table.

What are the different set operations in SQL? Set operations allow the results of multiple queries to be combined into a single result set. Set operators include UNION, INTERSECT, and EXCEPT.

What is the difference between joins and set operations? UNION puts lines from queries after each other, while JOIN makes a cartesian product and subsets it -- completely different operations.

What is normalization? It is the processes of reducing the redundancy of data in the table and also improving the data integrity. So why is this required? without Normalization in SQL, we may face many issues such as

Insertion anomaly: It occurs when we cannot insert data to the table without the presence of another attribute Update anomaly: It is a data inconsistency that results from data redundancy and a partial update of data. Deletion Anomaly: It occurs when certain attributes are lost because of the deletion of other attributes.

In brief, normalization is a way of organizing the data in the database. Normalization entails organizing the columns and tables of a database to ensure that their dependencies are properly enforced by database integrity constraints.

What are the requirements for the first three normal forms?

Normalization Rule Normalization rules are divided into the following normal forms:

First Normal Form Second Normal Form Third Normal Form BCNF Fourth Normal Form

First Normal Form (1NF) For a table to be in the First Normal Form, it should follow the following 4 rules:

It should only have single(atomic) valued attributes/columns. Values stored in a column should be of the same domain All the columns in a table should have unique names. And the order in which data is stored, does not matter. In the next tutorial, we will discuss about the First Normal Form in details.

Second Normal Form (2NF) For a table to be in the Second Normal Form,

It should be in the First Normal form. And, it should not have Partial Dependency. To understand what is Partial Dependency and how to normalize a table to 2nd normal for, jump to the Second Normal Form tutorial.

Third Normal Form (3NF) A table is said to be in the Third Normal Form when,

It is in the Second Normal form. And, it doesn't have Transitive Dependency.

Boyce and Codd Normal Form (BCNF) Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following conditions must be satisfied:

R must be in 3rd Normal Form and, for each functional dependency ( X → Y ), X should be a super Key.

Fourth Normal Form (4NF) A table is said to be in the Fourth Normal Form when,

It is in the Boyce-Codd Normal Form. And, it doesn't have Multi-Valued Dependency.

JDBC:

What is JDBC? JDBC is an API (Application programming interface) used to communicate Java application to database in database independent and platform independent manner.

What dependencies are required to connect to an RDBMS? JDBC

What are the main interfaces of JDBC? DriverManager This class manages the JDBC drivers. You need to register your drivers to this. It provides methods such as registerDriver() and getConnection(). Driver This interface is the Base interface for every driver class i.e. If you want to create a JDBC Driver of your own you need to implement this interface. If you load a Driver class (implementation of this interface), it will create an instance of itself and register with the driver manager.

Statement This interface represents a static SQL statement. Using the Statement object and its methods, you can execute an SQL statement and get the results of it. It provides methods such as execute(), executeBatch(), executeUpdate() etc. To execute the statements.

PreparedStatement This represents a precompiled SQL statement. An SQL statement is compiled and stored in a prepared statement and you can later execute this multiple times. You can get an object of this interface using the method of the Connection interface named prepareStatement(). This provides methods such as executeQuery(), executeUpdate(), and execute() to execute the prepared statements and getXXX(), setXXX() (where XXX is the datatypes such as long int float etc..) methods to set and get the values of the bind variables of the prepared statement.

CallableStatement Using an object of this interface you can execute the stored procedures. This returns single or multiple results. It will accept input parameters too. You can create a CallableStatement using the prepareCall() method of the Connection interface. Just like Prepared statement, this will also provide setXXX() and getXXX() methods to pass the input parameters and to get the output parameters of the procedures.

Connection This interface represents the connection with a specific database. SQL statements are executed in the context of a connection. This interface provides methods such as close(), commit(), rollback(), createStatement(), prepareCall(), prepareStatement(), setAutoCommit() setSavepoint() etc.

ResultSet This interface represents the database result set, a table which is generated by executing statements. This interface provides getter and update methods to retrieve and update its contents respectively.

ResultSetMetaData This interface is used to get the information about the result set such as, number of columns, name of the column, data type of the column, schema of the result set, table name, etc It provides methods such as getColumnCount(), getColumnName(), getColumnType(), getTableName(), getSchemaName() etc.

In what order are they used? DriverManager Connection Statement

What are the different types of statements? Statement Use this for general-purpose access to your database. Useful when you are using static SQL statements at runtime. The Statement interface cannot accept parameters. PreparedStatement Use this when you plan to use the SQL statements many times. The PreparedStatement interface accepts input parameters at runtime. CallableStatement Use this when you want to access the database stored procedures. The CallableStatement interface can also accept runtime input parameters.

What is SQL Injection? SQL Injection (SQLi) is a type of an injection attack that makes it possible to execute malicious SQL statements.

What is a DAO? is a pattern that provides an abstract interface to some type of database or other persistence mechanism. By mapping application calls to the persistence layer, the DAO provides some specific data operations without exposing details of the database. This isolation supports the single responsibility principle. It separates what data access the application needs, in terms of domain-specific objects and data types (the public interface of the DAO), from how these needs can be satisfied with a specific DBMS, database schema, etc. (the implementation of the DAO).

Describe client-server communication: how does it apply to PostgreSQL? clients make request to servers/ servers send request back to clients / servers in postgres process the queries made by the client and send them back to them.

What does a connection URL for PostgreSQL look like? jdbc:postgresql://host:port/database

How do you connect to a PostgreSQL server with a client? login with database user credentials With Java? setup database credentials in jdbc

Why is connection pooling important? a connection pool is a cache of database connections maintained so that the connections can be reused when future requests to the database are required. Connection pools are used to enhance the performance of executing commands on a database. Opening and maintaining a database connection for each user, especially requests made to a dynamic database-driven website application, is costly and wastes resources. In connection pooling, after a connection is created, it is placed in the pool and it is used again so that a new connection does not have to be established. If all the connections are being used, a new connection is made and is added to the pool. Connection pooling also cuts down on the amount of time a user must wait to establish a connection to the database.

Linux:

What is Unix? Unix systems are characterized by a modular design that is sometimes called the "Unix philosophy". According to this philosophy, the operating system should provide a set of simple tools, each of which performs a limited, well-defined function. A unified filesystem (the Unix filesystem) and an inter-process communication mechanism known as "pipes" serve as the main means of communication, and a shell scripting and command language (the Unix shell) is used to combine the tools to perform complex workflows.

Unix distinguishes itself from its predecessors as the first portable operating system: almost the entire operating system is written in the C programming language, which allows Unix to operate on numerous platforms. What is Linux?

What are Linux pipelines?

A pipe is a form of redirection (transfer of standard output to some other destination) that is used in Linux and other Unix-like operating systems to send the output of one command/program/process to another command/program/process for further processing. The Unix/Linux systems allow stdout of a command to be connected to stdin of another command. You can make it do so by using the pipe character ‘|’.

What distribution of Linux are you familiar with? Which distributions are in the same family?

Kubuntu An official derivative of Ubuntu Linux using KDE instead of the GNOME (or Unity) interface used by default in Ubuntu. Lubuntu Lubuntu is a project that is an official derivative of the Ubuntu operating system that is "lighter, less resource hungry and more energy-efficient", using the LXQt desktop environment (used LXDE before 18.10). Ubuntu Budgie An official derivative of Ubuntu using Budgie. Ubuntu Kylin An official derivative aimed at the Chinese market. Ubuntu MATE An official derivative of Ubuntu using MATE, a desktop environment forked from the now-defunct GNOME 2 code base, with an emphasis on the desktop metaphor. Ubuntu Server An official derivative made for use in servers & IBM mainframes. Ubuntu Server handles mail, controls printers, acts as a fileserver, can host LAMP and more. Ubuntu Studio Based on Ubuntu, providing open-source applications for multimedia creation aimed at the audio, video and graphic editors. Xubuntu An official derivative of Ubuntu using Xfce. Xubuntu is intended for use on less-powerful computers or those who seek a highly efficient desktop environment on faster systems, and uses mostly GTK+ applications.

What is a shell? Which shell are you familiar with? The shell is much more than just a command interpretor, it is also a programming language of its own with complete programming language constructs such as conditional execution, loops, variables, functions and many more.

That is why the Unix/GNU Linux shell is more powerful compared to the Windows shell.

In this article, we shall take a look at some of the top most used open source shells on Unix/GNU Linux.

Bash Shell Bash stands for Bourne Again Shell and it is the default shell on many Linux distributions today.

What is shell scripting? designed to be run by the Unix shell, a command-line interpreter.[1] The various dialects of shell scripts are considered to be scripting languages. Typical operations performed by shell scripts include file manipulation, program execution, and printing text. A script which sets up the environment, runs the program, and does any necessary cleanup, logging, etc. is called a wrapper.

What is a package manager? Which one is used on RHEL? All modern linux distributions organize software in packages which contain applications binaries, files, metadata and information about the package dependencies, possible conflicts with other packages etc. The core Rhel package manager is called rpm itself, and it is the tool used also by dnf, the higher level package manager, which is capable to manage dependencies.

What is grep? grep searches the named input FILEs for lines containing a match to the given PATTERN. By default, grep prints the matching lines. it's a regex function.

man? It displays the docs for the function.

Explain basic Unix commands for file creation and directory navigation. touch mkdir cd ..

Where is the home directory located? Where is root? ~ the mount point of the drive.

How is the Linux filesystem structured and organized? What are some important directories?

/ – Root Every single file and directory starts from the root directory. Only root user has write privilege under this directory. Please note that /root is root user’s home directory, which is not same as /.
/bin – User Binaries Contains binary executables. Common linux commands you need to use in single-user modes are located under this directory. Commands used by all the users of the system are located here. For example: ps, ls, ping, grep, cp.
/sbin – System Binaries Just like /bin, /sbin also contains binary executables. But, the linux commands located under this directory are used typically by system aministrator, for system maintenance purpose. For example: iptables, reboot, fdisk, ifconfig, swapon
/etc – Configuration Files Contains configuration files required by all programs. This also contains startup and shutdown shell scripts used to start/stop individual programs. For example: /etc/resolv.conf, /etc/logrotate.conf
/dev – Device Files Contains device files. These include terminal devices, usb, or any device attached to the system. For example: /dev/tty1, /dev/usbmon0
/proc – Process Information Contains information about system process. This is a pseudo filesystem contains information about running process. For example: /proc/{pid} directory contains information about the process with that particular pid. This is a virtual filesystem with text information about system resources. For example: /proc/uptime
/var – Variable Files var stands for variable files. Content of the files that are expected to grow can be found under this directory. This includes — system log files (/var/log); packages and database files (/var/lib); emails (/var/mail); print queues (/var/spool); lock files (/var/lock); temp files needed across reboots (/var/tmp);
/tmp – Temporary Files Directory that contains temporary files created by system and users. Files under this directory are deleted when system is rebooted.
/usr – User Programs Contains binaries, libraries, documentation, and source-code for second level programs. /usr/bin contains binary files for user programs. If you can’t find a user binary under /bin, look under /usr/bin. For example: at, awk, cc, less, scp /usr/sbin contains binary files for system administrators. If you can’t find a system binary under /sbin, look under /usr/sbin. For example: atd, cron, sshd, useradd, userdel /usr/lib contains libraries for /usr/bin and /usr/sbin /usr/local contains users programs that you install from source. For example, when you install apache from source, it goes under /usr/local/apache2
/home – Home Directories Home directories for all users to store their personal files. For example: /home/john, /home/nikita
/boot – Boot Loader Files Contains boot loader related files. Kernel initrd, vmlinux, grub files are located under /boot For example: initrd.img-2.6.32-24-generic, vmlinuz-2.6.32-24-generic
/lib – System Libraries Contains library files that supports the binaries located under /bin and /sbin Library filenames are either ld or lib.so.* For example: ld-2.11.1.so, libncurses.so.5.7
/opt – Optional add-on Applications opt stands for optional. Contains add-on applications from individual vendors. add-on applications should be installed under either /opt/ or /opt/ sub-directory.
/mnt – Mount Directory Temporary mount directory where sysadmins can mount filesystems.
/media – Removable Media Devices Temporary mount directory for removable devices. For examples, /media/cdrom for CD-ROM; /media/floppy for floppy drives; /media/cdrecorder for CD writer
/srv – Service Data srv stands for service. Contains server specific services related data. For example, /srv/cvs contains CVS related data.

What is vim? Nano? Text editors that come on the server.

How do I change file permissions? chmod +rwx filename to add permissions. chmod -rwx directoryname to remove permissions. chmod +x filename to allow executable permissions. chmod -wx filename to take out write and executable permissions.

How do I create local and global environment variables?

Files that affect behaviour for every user: /etc/bash.bashrc : This file is read when ever an interactive shell is started (normal terminal) and all the commands specified in here are executed one by one. /etc/profile : This file is read every time a user logs in, Thus all the commands executed in here will execute only once at the time of user logging in. Use Cases:

To specify global/local environment variables.
Shell configuration, like colour profile etc. Files that affect behaviour for only a specific user: ~/.bashrc : This file behaves the same way /etc/bash.bashrc file works just that it is executed only for a specific user. If you want to create an environment for yourself go ahead modify or create this file in your home directory. ~/.profile : This file is same as /etc/profile difference comes the way it is executed, This file is executed only when a user in whose home directory this file exists, logs in.

What utilities can help you monitor a system’s processes? It’s filesystem? Top

df command. df is a Linux command line utility to monitor your Linux disk usage.

What is ssh? How can you use it to connect to a Linux system? The SSH protocol (also referred to as Secure Shell) is a method for secure remote login from one computer to another. It provides several alternative options for strong authentication, and it protects the communications security and integrity with strong encryption. It is a secure alternative to the non-protected login protocols

ssh user@ip

What is a private and public key?

Asymmetric Cryptography - Algorithms As with any encryption scheme, public key authentication is based on an algorithm. There are several well-researched, secure, and trustworthy algorithms out there - the most common being the likes of RSA and DSA. Unlike the commonly known (symmetric or secret-key) encryption algorithms the public key encryption algorithms work with two separate keys. These two keys form a pair that is specific to each user.

Key Pair - Public and Private In the SSH public key authentication use case, it is rather typical that the users create (i.e. provision) the key pair for themselves. SSH implementations include easily usable utilities for this (for more information see ssh-keygen and ssh-copy-id).

Each SSH key pair includes two keys:

A public key that is copied to the SSH server(s). Anyone with a copy of the public key can encrypt data which can then only be read by the person who holds the corresponding private key. Once an SSH server receives a public key from a user and considers the key trustworthy, the server marks the key as authorized in its authorized_keys file. Such keys are called authorized keys.

A private key that remains (only) with the user. The possession of this key is proof of the user's identity. Only a user in possession of a private key that corresponds to the public key at the server will be able to authenticate successfully. The private keys need to be stored and handled carefully, and no copies of the private key should be distributed. The private keys used for user authentication are called identity keys.

=====================================================================================================

Docker Commands: docker --help - displays the help file. docker attach - Attaches your local input/output/error stream to a running container. docker commit - Creates a new image from the current changed state of the container. docker exec- Runs a command in a container that is active or running. docker history- Displays the history of an image. docker info- Shows system-wide information. docker inspect- Finds system-level information about docker containers and images. docker login- Logins to local registry or Docker Hub. docker pull- Pulls an image or a repository from your local registry or Docker Hub. docker ps- Lists various properties of containers. docker restart- Stops and starts a container. docker rm- Remove containers. docker rmi- Remove images docker run- Runs a command in an isolated container. docker search- Searches the Docker Hub for images. docker start- Starts already stopped containers. docker stop- Stops running containers. docker version- Provides docker version information.

200413-java-spark / 200413-java-spark.github.io

Week 2 - Linux & SQL #4