data-centric-computing / dcic-public

Repository for (for now) filing bug reports about DCIC.
19 stars 2 forks source link

[DCIC Book]: Confusing Pandas data cleaning example #68

Closed jjo32 closed 6 months ago

jjo32 commented 11 months ago

Contact Details

No response

Which Web page has the problem?

https://dcic-world.org/2023-02-21/python-tables-Pandas.html#%28part._.Clearing_out_unknown_values%29

What's the problem?

We are told to use: events.loc[mask,'discount'] = '' rather than: events[mask]['discount'] = '' for a reason that will be explained later, but I don't think is ever explained, at least not directly to us in this chapter.

Right after we are instructed to use events.loc[mask,'discount'] = '', the example given uses the other notation, see quoted text below from the book:

Putting it all together, the entire program looks like:

codes = ['birthday', 'student'] mask = ~events['discount'].isin(codes) events[mask]['discount'] = ''

What browser are you seeing the problem on?

No response

kfisler commented 6 months ago

Code fixed and reference to future discussion of mutation modified to refer to mutable-lists section. That section does NOT close the loop here, as it doesn't come back to talk about pandas tables. It needs to -- putting that on the list for future revision.

kfisler commented 6 months ago

commit e76746086da4874334a5e79e24637cc5b5660160