Removed redundant Pipe.sync_time property.
Use pipe.get_sync_time() instead.
Removed SQLConnector.get_pipe_backtrack_minutes().
Use pipe.get_backtrack_interval() instead.
Replaced pipe.parameters['chunk_time_interval'] with pipe.parameters['verify']['chunk_minutes']
For better security and cohesiveness, the TimescaleDB chunk_time_interval value is now derived from the standard chunk_minutes value. This also means pipes with integer date axes will be created with a new default chunk interval of 1440 (was previously 100,000).
Moved choose_subaction() into meerschaum.actions.
This function is for internal use and as such should not affect any users.
Features
Added verify pipes and --verify.
The command mrsm verify pipes or mrsm sync pipes --verify will resync pipes' chunks with different rowcounts to catch any backfilled data.
Added deduplicate pipes and --deduplicate.
Running mrsm deduplicates pipes or mrsm sync pipes --deduplicate will iterate over pipes' entire intervals, chunking at the configured chunk interval (see pipe.get_chunk_interval() below) and clearing + resyncing chunks with duplicate rows.
If your instance connector implements deduplicate_pipe() (e.g. SQLConnector), then this method will override the default pipe.deduplicate().
Added preliminary dask support.
For example, you may now return Dask DataFrames in your plugins, pass into pipe.sync(), and pipe.get_data() now has the flag as_dask.
Added chunk_minutes to pipe.parameters['verify'].
Like pipe.parameters['fetch']['backtrack_minutes'], you may now specify the default chunk interval to use for verification syncs and iterating over the datetime axis.
Added --chunk-minutes, --chunk-hours, and --chunk-days.
You may override a pipe's chunk interval during a verification sync with --chunk-minutes (or --chunk-hours or --chunk-days).
mrsm verify pipes --chunk-days 3
Added pipe.get_chunk_interval() and pipe.get_backtrack_interval().
Return the timedelta (or int for integer datetimes) from verify:chunk_minutes and fetch:backtrack_minutes, respectively.
Added --bounded to verification syncs.
By default, verify pipes is unbounded, meaning it will sync values beyond the existing minimum and maximum datetime values. Running a verification sync with --bounded will bound the search to the existing datetime axis.
mrsm sync pipes --verify --bounded
Added pipe.get_num_workers().
Return the number of concurrent threads to be used with this pipe (with respect to its instance connector's thread safety).
Added select_columns and omit_columns to pipe.get_data().
In situations where not all columns are required, you can now either specify which columns you want to include (select_columns) and which columns to filter out (omit_columns). You may pass a list of columns or a single column, and the value '*' for select_columns will be treated as None (i.e. SELECT *).
pipe = mrsm.Pipe('a', 'b', 'c', instance='sql:local')
pipe.sync([{'a': 1, 'b': 2, 'c': 3}])
pipe.get_data(['a', 'b'])
# a b
# 0 1 2
pipe.get_data('*', 'b')
# a c
# 0 1 3
pipe.get_data(None, ['a', 'c'])
# b
# 0 2
pipe.get_data(omit_columns=['b', 'c'])
# a
# 0 1
pipe.get_data(select_columns=['c', 'a'])
# c a
# 0 3 1
Replace daemoniker with python-daemon. python-daemon is a well-maintained and well-behaved daemon process library. However, this migration removes Windows support for background jobs (which was never really fully supported already, so no harm there).
Added pause jobs.
In addition to start jobs and stop jobs, the command pause jobs will suspend a job's daemon. Jobs may be resumed with start jobs (i.e. Daemon.resume()).
Added job management to the UI.
Now that jobs and logs are much more robust, more job management features have been added to the web UI. Jobs may be started, stopped, paused, and resumed from the web console, and their logs are now available for download.
Logs now roll over and are preserved on job restarts.
Spin up long-running job with peace of mind now that logs are automatically rolled over, keeping five 500 KB files on disk at any moment (you can tweak these values with mrsm edit config jobs).
To facilitate this, meershaum.utils.daemon.RotatingFile was added to provide a generic file-like object, complete with its own file descriptor.
Starting existing jobs with -d will not throw an exception if the arguments match.
Similarly, running without any arguments other than --name will run the existing job. This matches the behavior of start jobs.
Allow for colon-separated paths in MRSM_PLUGINS_DIR.
Just like PATH in bash, you may now specify your plugins' paths in a single variable, separated by colons. Unlike bash, however, a blank path will not interpreted as the current directory.
export MRSM_PLUGINS_DIR='./plugins:/app/plugins'
Add pipe.keys() pipe.keys() returns the connector, metric, and location keys (i.e. pipe.meta without the instance).
Fixed backtracking being incorrectly applied to --begin.
Application of the backtracking interval has been consolidated into pipe.fetch().
Improved data type enforcement for SQL pipes.
A pipe's data types are now passed to SQLConnector.read() when fetching its data.
Added meerschaum.utils.sql.get_db_version() and SQLConnector.db_version.
Moved print_options() from meerschaum.utils.misc into meerschaum.utils.formatting.
This places print_options() next to print_tuple and pprint. A placeholder function is still present in meerschaum.utils.misc to preserve existing behavior.
mrsm.pprint() will now pretty-print SuccessTuples.
Added calm to print_tuple().
Printing a SuccessTuple with calm=True will use a more muted color scheme and emoji.
Removed round_down from get_sync_time() for instance connectors.
To avoid confusion, sync times are no longer truncated by default. round_down is still an optional keyword argument on pipe.get_sync_time().
Created meerschaum.utils.dtypes.
Added are_dtypes_equal() to meerschaum.utils.dtypes.
Added get_db_type_from_pd_type() to meerschaum.utils.dtypes.sql.
Added get_pb_type_from_db_type() to meerschaum.utils.dtypes.sql.
Moved to_pandas_dtype() from meerschaum.utils.misc into meerschaum.utils.dtypes.
Created meerschaum.utils.dataframe.
Added chunksize_to_npartitions() to meerschaum.utils.dataframe.
Added get_first_valid_dask_partition() to meerschaum.utils.dataframe.
Moved filter_unseen_df() from meerschaum.utils.misc into meerschaum.utils.dataframe.
Moved add_missing_cols_to_df() from meerschaum.utils.misc into meerschaum.utils.dataframe.
Moved parse_df_datetimes() from meerschaum.utils.misc into meerschaum.utils.dataframe.
Moved df_from_literal() from meerschaum.utils.misc into meerschaum.utils.dataframe.
Moved get_json_cols() from meerschaum.utils.misc into meerschaum.utils.dataframe.
Moved get_unhashable_cols() from meerschaum.utils.misc into meerschaum.utils.dataframe.
Moved enforce_dtypes() from meerschaum.utils.misc into meerschaum.utils.dataframe.
Moved get_datetime_bound_from_df() from meerschaum.utils.misc into meerschaum.utils.dataframe.
Moved df_is_chunk_generator() from meerschaum.utils.misc into meerschaum.utils.dataframe.
Refactored SQL utilities.
Added format_cte_subquery() to meerschaum.utils.sql.
Added get_create_table_query() to meerschaum.utils.sql.
Added get_db_version() to meerschaum.utils.sql.
Added get_rename_table_queries() to meerschaum.utils.sql.
Moved choices_docstring() from meerschaum.utils.misc into meerschaum.actions.
v2.0.0
Breaking Changes
Removed redundant
Pipe.sync_time
property.Use
pipe.get_sync_time()
instead.Removed
SQLConnector.get_pipe_backtrack_minutes()
.Use
pipe.get_backtrack_interval()
instead.Replaced
pipe.parameters['chunk_time_interval']
withpipe.parameters['verify']['chunk_minutes']
For better security and cohesiveness, the TimescaleDB
chunk_time_interval
value is now derived from the standardchunk_minutes
value. This also means pipes with integer date axes will be created with a new default chunk interval of 1440 (was previously 100,000).Moved
choose_subaction()
intomeerschaum.actions
.This function is for internal use and as such should not affect any users.
Features
Added
verify pipes
and--verify
.The command
mrsm verify pipes
ormrsm sync pipes --verify
will resync pipes' chunks with different rowcounts to catch any backfilled data.Added
deduplicate pipes
and--deduplicate
.Running
mrsm deduplicates pipes
ormrsm sync pipes --deduplicate
will iterate over pipes' entire intervals, chunking at the configured chunk interval (seepipe.get_chunk_interval()
below) and clearing + resyncing chunks with duplicate rows.If your instance connector implements
deduplicate_pipe()
(e.g.SQLConnector
), then this method will override the defaultpipe.deduplicate()
.Added
pyarrow
support.The dtypes enforcement system was overhauled to add support for
pyarrow
data types.Added
bool
support.Pipes may now sync DataFrames with booleans (even on Oracle and MySQL):
Added preliminary
dask
support.For example, you may now return Dask DataFrames in your plugins, pass into
pipe.sync()
, andpipe.get_data()
now has the flagas_dask
.Added
chunk_minutes
topipe.parameters['verify']
.Like
pipe.parameters['fetch']['backtrack_minutes']
, you may now specify the default chunk interval to use for verification syncs and iterating over the datetime axis.Added
--chunk-minutes
,--chunk-hours
, and--chunk-days
.You may override a pipe's chunk interval during a verification sync with
--chunk-minutes
(or--chunk-hours
or--chunk-days
).Added
pipe.get_chunk_interval()
andpipe.get_backtrack_interval()
.Return the
timedelta
(orint
for integer datetimes) fromverify:chunk_minutes
andfetch:backtrack_minutes
, respectively.Added
pipe.get_chunk_bounds()
.Return a list of
begin
andend
values to use when iterating over a pipe's datetime axis.Added
--bounded
to verification syncs.By default,
verify pipes
is unbounded, meaning it will sync values beyond the existing minimum and maximum datetime values. Running a verification sync with--bounded
will bound the search to the existing datetime axis.Added
pipe.get_num_workers()
.Return the number of concurrent threads to be used with this pipe (with respect to its instance connector's thread safety).
Added
select_columns
andomit_columns
topipe.get_data()
.In situations where not all columns are required, you can now either specify which columns you want to include (
select_columns
) and which columns to filter out (omit_columns
). You may pass a list of columns or a single column, and the value'*'
forselect_columns
will be treated asNone
(i.e.SELECT *
).Replace
daemoniker
withpython-daemon
.python-daemon
is a well-maintained and well-behaved daemon process library. However, this migration removes Windows support for background jobs (which was never really fully supported already, so no harm there).Added
pause jobs
.In addition to
start jobs
andstop jobs
, the commandpause jobs
will suspend a job's daemon. Jobs may be resumed withstart jobs
(i.e.Daemon.resume()
).Added job management to the UI.
Now that jobs and logs are much more robust, more job management features have been added to the web UI. Jobs may be started, stopped, paused, and resumed from the web console, and their logs are now available for download.
Logs now roll over and are preserved on job restarts.
Spin up long-running job with peace of mind now that logs are automatically rolled over, keeping five 500 KB files on disk at any moment (you can tweak these values with
mrsm edit config jobs
). To facilitate this,meershaum.utils.daemon.RotatingFile
was added to provide a generic file-like object, complete with its own file descriptor.Starting existing jobs with
-d
will not throw an exception if the arguments match.Similarly, running without any arguments other than
--name
will run the existing job. This matches the behavior ofstart jobs
.Allow for colon-separated paths in
MRSM_PLUGINS_DIR
.Just like
PATH
inbash
, you may now specify your plugins' paths in a single variable, separated by colons. Unlikebash
, however, a blank path will not interpreted as the current directory.Add
pipe.keys()
pipe.keys()
returns the connector, metric, and location keys (i.e.pipe.meta
without theinstance
).Pipes are now indexable.
Indexing a pipe directly is the same as accessing
pipe.attributes
:Other changes
Fixed backtracking being incorrectly applied to
--begin
.Application of the backtracking interval has been consolidated into
pipe.fetch()
.Improved data type enforcement for SQL pipes.
A pipe's data types are now passed to
SQLConnector.read()
when fetching its data.Added
meerschaum.utils.sql.get_db_version()
andSQLConnector.db_version
.Moved
print_options()
frommeerschaum.utils.misc
intomeerschaum.utils.formatting
.This places
print_options()
next toprint_tuple
andpprint
. A placeholder function is still present inmeerschaum.utils.misc
to preserve existing behavior.mrsm.pprint()
will now pretty-printSuccessTuples
.Added
calm
toprint_tuple()
.Printing a
SuccessTuple
withcalm=True
will use a more muted color scheme and emoji.Removed
round_down
fromget_sync_time()
for instance connectors.To avoid confusion, sync times are no longer truncated by default.
round_down
is still an optional keyword argument onpipe.get_sync_time()
.Created
meerschaum.utils.dtypes
.are_dtypes_equal()
tomeerschaum.utils.dtypes
.get_db_type_from_pd_type()
tomeerschaum.utils.dtypes.sql
.get_pb_type_from_db_type()
tomeerschaum.utils.dtypes.sql
.to_pandas_dtype()
frommeerschaum.utils.misc
intomeerschaum.utils.dtypes
.Created
meerschaum.utils.dataframe
.chunksize_to_npartitions()
tomeerschaum.utils.dataframe
.get_first_valid_dask_partition()
tomeerschaum.utils.dataframe
.filter_unseen_df()
frommeerschaum.utils.misc
intomeerschaum.utils.dataframe
.add_missing_cols_to_df()
frommeerschaum.utils.misc
intomeerschaum.utils.dataframe
.parse_df_datetimes()
frommeerschaum.utils.misc
intomeerschaum.utils.dataframe
.df_from_literal()
frommeerschaum.utils.misc
intomeerschaum.utils.dataframe
.get_json_cols()
frommeerschaum.utils.misc
intomeerschaum.utils.dataframe
.get_unhashable_cols()
frommeerschaum.utils.misc
intomeerschaum.utils.dataframe
.enforce_dtypes()
frommeerschaum.utils.misc
intomeerschaum.utils.dataframe
.get_datetime_bound_from_df()
frommeerschaum.utils.misc
intomeerschaum.utils.dataframe
.df_is_chunk_generator()
frommeerschaum.utils.misc
intomeerschaum.utils.dataframe
.Refactored SQL utilities.
format_cte_subquery()
tomeerschaum.utils.sql
.get_create_table_query()
tomeerschaum.utils.sql
.get_db_version()
tomeerschaum.utils.sql
.get_rename_table_queries()
tomeerschaum.utils.sql
.Moved
choices_docstring()
frommeerschaum.utils.misc
intomeerschaum.actions
.Fixed handling backslashes for
stack
on Windows.