PacktPublishing / Distributed-Data-Systems-with-Azure-Databricks

Distributed Data Systems with Azure Databricks, published by Packt
MIT License
12 stars 10 forks source link

Chapter 4: Pages 124 and 125 #2

Open tanthiamhuat opened 2 years ago

tanthiamhuat commented 2 years ago

In Figure 4.2, there is no column named "date", "type", "latitude" and "longitude". But why in Figure 4.3, it is showing the columns "date", "type", "latitude" and "longitude"?

And if there is no "date" columns, we would not able to do the groupBy("date") in your instruction below in page 125:

from pyspark.sql.functions import count display(covid_parquet.groupBy("date").agg(count("*"). alias("TotalCount")).orderBy("TotalCount", ascending=False).limit(20))

Could you clarify, please? Thanks.

DataSpacon commented 1 year ago

I used "confirmed_date"

from pyspark.sql.functions import count display(covid_parquet.groupBy("confirmed_date").agg(count("*").alias("TotalCount")).orderBy("TotalCount", ascending=False).limit(20))