Open frbelotto opened 8 months ago
The query doesn't make much sense, you are computing nunique on one of the group columns which will always return 1
We should fix this anyway though
The query doesn't make much sense, you are computing nunique on one of the group columns which will always return 1
We should fix this anyway though
LOL. This query is part of a bigger query sentence to extract how many unique clients ("MCIs") have consumed from each brand. Maybe it could be written is a smarter way, but when we get what was expected, I as not changing it anymore LOL
**base_consumo = gerado.loc[(gerado['produto'] == 'Afiliados') & (gerado['data'] >= datetime(2024,1,1))].groupby(['mci', 'marca'], dropna=False, observed=True)['marca'].nunique().to_frame()**
base_consumo = base_consumo.groupby(['mci'], dropna=False, observed=True).aggregate({'marca' : 'sum'})
base_consumo = base_consumo.rename(columns={'marca' : 'qtde_marcas'}).reset_index()
base_consumo = base_consumo.groupby('qtde_marcas', dropna=False, observed=True)['mci'].nunique().to_frame()
base_consumo = base_consumo.compute()
base_consumo.to_excel(f'{pastaloja}\\Clientes_por_marcas_afiliados.xlsx', merge_cells=False)
PRs to fix are welcome
PRs to fix are welcome
Is there a newbie guide for where to start? My knowledges on python is average, but I've no experience in building, sharing and keeping a library. I don't even know how to read the source code of the unique method so I could try to better understand what is happening.
And for the related example of bug,it's interesting the the "marca" column returns an error, but any other column seems to work. My first thought was that is something related to the category dtype ( a very buggy dtype), but I've tried changing it to string and the error persist
The error happens because Marca is part of your grouping keys, it’s not dtype related
The error happens because Marca is part of your grouping keys, it’s not dtype related
But it does not happen if I use MCI column.
Hello guys, Take this CSV as an example dataframe. I am sorry but I could set an example dataframe by coding able to reproduce such bug teste.csv
Now, lets open and execute the query on the example dataframe under dask 2023.10.0:
Runs ok!
Now, lets do the same test under dask 2024.2.1
Environment: