Open kdpsingh opened 1 year ago
Yes, if you do not want to store it in metadata then it is easier to just do logging (however, maybe you want to consider logging the changes in metadata as an opt-in - some users maybe would find it useful when doing lineage analysis?)
This is a great point. I may consider adding this later. In my mental model, the logging is tied to operations rather than data frames. For example, a join is a single operation and it's not clear that either data frame would "own" that metadata.
I may first implement this in a logging style and then think through the implications of storing some or all of the results as metadata.
a join is a single operation and it's not clear that either data frame would "own" that metadata.
I was thinking about it. The produced data frame "owns" the metadata as you need to know how it got created. Of course this is just food for thought for the future.
Confirmed that tidylog is MIT License: https://github.com/elbersb/tidylog/issues/61
Will aim for a mostly line-by-line translation of tidylog in R.
While we could consider autodetecting changes in the data frames (and treat all verbs the same), I think the tidylog approach to customize the output for each verb feels more natural and is probably more efficient.
R has a wonderful
tidylog
package that outputs a log of how an operation modified a dataframe (e.g., "filter: 300 rows were removed (10%) of the data, with 2,700 rows remaining.")I would like to implement this capability. I don't think that using TableMetadataTools.jl is necessarily the approach I want to take because this metadata should be printed (using
@info
orprintln
) but does not need to be permanently stored as part of the data frame.This will probably be implemented either using
@aside
or simply by wrapping the DataFrames.jl functions with atidylog
function that captures the state of the data frame before and after the operation and prints out the difference.