Xiaohan-Alyssa-Zhang / CleverCogs-Group7

0 stars 3 forks source link

How to join two tables? #8

Open JuneJuneJie opened 2 years ago

MuyuDeng commented 2 years ago

You can join two tables via ExternalID. Lynda mentioned that ExternalID is the unique identifier for each row in three different tables, so we could regard it as the primary key. Different from the primary key, an index is not a constraint, but helps you organize the table or selected columns to retrieve rows faster. Both methods can be used to join two tables and obtain information from different tables.

For example, if you would like to join the table including activity data(ccdate), and another table including gender data(user), in order to analysis the gender distribution in different activities.

To reduce running memory, you can start by extracting the valuable columns of information in each of the two tables: cc_join = pd.DataFrame(ccdate, columns=['ExternalID','Activity']) user_join = pd.DataFrame(user, columns=['ExternalID','Gender'])

And you can create join index like this: table_join = cc_join.set_index('ExternalID').join(user_join.set_index('ExternalID'))

Finally information on both the reported activity and its corresponding user gender can be looked up in the table_join!

For more information, you can consult the official documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html

Hope this helps!