QuXiangjie / Study-Review

自己欠缺的还太多了,希望通过总结每天的学习内容,整理每天的思绪来丰富自己的知识库。我想成为一名优秀的金融数据分析师,并行发展技术与商业业务。博客内容为:数理统计、财务业务、Python(数据分析及可视化)、Excel(数据分析)、SQL、英文
0 stars 0 forks source link

Rank scores #4

Open QuXiangjie opened 5 months ago

QuXiangjie commented 5 months ago

Question

SQL solution

SELECT S.Score, COUNT(S2.Score) as `Rank`
FROM Scores S,(SELECT DISTINCT Score FROM Scores) S2
WHERE S.Score<=S2.Score
GROUP BY S.Id
ORDER BY S.Score DESC;

To understand this solution, we need to know the order of SQL Execution From where group by having select order by

So the logic of this solution is that

we need to have a order distinct first, so we will use (SELECT DISTINCT Score FROM Scores) S2 then we have two tables, we need to know how many each score beats in S2, we will use where S.Score<=S2.Score group by S.Id Ps: we have to use S.id, otherwise, Scores table will keep compare the <= operators, and the answer count would be the sum of counts, so if we want to have each score in the table, we need to group by

QuXiangjie commented 5 months ago

Pandas solution Below is my solution, which is not correct, I can't pass the test of no id, no values but there are some additional knowledge I should know:

  1. sort_values, if I use one of the attribute to execute the sort_values function, I will get a series, instead of DataFrame
  2. drop([''],axis=1), can drop column
    
    import pandas as pd

def order_scores(scores: pd.DataFrame) -> pd.DataFrame: sorted_scores=scores['score'].sort_values( ascending=False ).drop_duplicates()

for i in scores['score']:
    num = 0
    for l in sorted_scores:
        if i <= l:
            num += 1
    scores.loc[scores['score'] == i, 'Rank'] = num
scores=scores.sort_values(
    by=['Rank'],
    ascending=True
).drop(['id'],axis=1)

return scores
> [Rank function](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html)

def order_scores(scores: pd.DataFrame) -> pd.DataFrame:

Use the rank method to assign ranks to the scores in descending order with no gaps

scores['rank'] = scores['score'].rank(method='dense', ascending=False)

# Drop id column & Sort the DataFrame by score in descending order
result_df = scores.drop('id',axis=1).sort_values(by='score', ascending=False)

return result_df

Easy and Simple, OMG!