a16z / nft-analyst-starter-pack

https://a16z.com/2022/03/18/nft-starter-pack-analyze-data-metadata-build-tools/
GNU Affero General Public License v3.0
461 stars 92 forks source link

Results contain duplicate data #12

Closed jinnzy closed 2 years ago

jinnzy commented 2 years ago

Thank you for your open source project, I found duplicate data in the results during use.

image

code:

https://github.com/a16z/nft-analyst-starter-pack/blob/e14844a4dfafaf20b62b271f0c61c63e7abb22d5/core/generate_sales_output.py#L98

The merge function will cause duplicate data.

test

import pandas as pd
df1 = pd.DataFrame({"name":["kate","sally"],
                    "age":[25,285]})
df2 = pd.DataFrame({"name":["kate","herz","sally",'sally'],
                     "score":[70,60,11,11], "age":[23,41,285,44]})
print(pd.merge(df1,df2,on="name", how='left'))

output

    name  age_x  score  age_y
0   kate     25     70     23
1  sally    285     11    285
2  sally    285     11     44