LTHTR-DST / hdruk_avoidable_admissions

HDRUK Data Science Collaboration on Avoidable Admissions in the NHS.
https://lthtr-dst.github.io/hdruk_avoidable_admissions/
MIT License
6 stars 5 forks source link

Is it necessary to create a copy of the dataframe prior to passing it to validate dataframe? #11

Closed MattStammers closed 1 year ago

MattStammers commented 1 year ago

As the dataframe gets copied anyway within the validate_dataframe() function why do we create a copy of df_admcare prior to running the first validation? Isn't this just duplication?

# Create a copy of the data to fix DQ issues to avoid reloading data from source everytime
dfa = df_admcare.copy()

"Raw dataframe contains %d rows and %d columns" % dfa.shape
vvcb commented 1 year ago

Not required.

But, if you have loaded df_admcare from a network drive over a VPN, like we have to, and make a mistake in a transformation step, this avoids having to reload from source.

But, no you don't need to create a copy.

MattStammers commented 1 year ago

fair enough