Closed elementc closed 3 years ago
Hey Element, thanks for opening the issue, sorry I just now noticed! I've been hesitant to write the README but it's about time I clean it up as we're nearing a beta release.
I'm having trouble clearly conveying the usages for Meerschaum in a single README. But minimal usage would look something like the following:
### install Meerschaum with its full-feature dependencies
$ pip install --upgrade meerschaum[full]
### launch the Meerschaum stack
### (`mrsm stack` is an alias of docker-compose,
### arguments in [brackets] are passed to the subprocess and not Meerschaum itself)
$ mrsm stack [-d]
### launch the Meerschaum shell, where you can manage connectors, pipes, etc.
### (actions done in the shell can be done on the command-line instead)
$ mrsm
Meerschaum vX.X.X
mrsm ➤
I think I need an FAQ for the available actions
. The usage and options can be seen in the CLI with help [action]
.
I'm planning on having a dashboard that interacts with the API like the CLI does, but frontend development is not my specialty.
Maybe I should make tutorials? Documentation can be overwhelming.
No worries, feel free to call me Casey, we spoke this past sunday at the LUP LUG.
I think a minimal example from zero to your first pipe would be useful. I tried to follow your instructions and what's in the readme, and I was unable to get past starting the stack:
(mschm) casey@IRONFIRE:~$ mrsm stack [-d]
NOTE: Configuration file is missing. Falling back to default configuration.
You can edit the configuration with `edit config` or replace the file /home/casey/.config/meerschaum/config.yaml
Missing file /home/casey/.config/meerschaum/stack/resources/docker-compose.yaml.
Bootstrap stack configuration?
NOTE: The following files will be overwritten: [PosixPath('/home/casey/.config/meerschaum/stack/resources/docker-compose.yaml'), PosixPath('/home/casey/.config/meerschaum/stack/grafana/resources/provisioning/datasources/datasource.yaml'), PosixPath('/home/casey/.config/meerschaum/stack/grafana/resources/provisioning/dashboards/dashboard.yaml')] [Y/n] Y
ERROR:
Can't find a suitable configuration file in this directory or any
parent. Are you in the right directory?
Supported filenames: docker-compose.yml, docker-compose.yaml
(mschm) casey@IRONFIRE:~$ mrsm stack [-d]
Missing file /home/casey/.config/meerschaum/stack/resources/docker-compose.yaml.
Bootstrap stack configuration?
NOTE: The following files will be overwritten: [PosixPath('/home/casey/.config/meerschaum/stack/resources/docker-compose.yaml'), PosixPath('/home/casey/.config/meerschaum/stack/grafana/resources/provisioning/datasources/datasource.yaml'), PosixPath('/home/casey/.config/meerschaum/stack/grafana/resources/provisioning/dashboards/dashboard.yaml')] [Y/n] Y
ERROR: Version in "./docker-compose.yaml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/
(mschm) casey@IRONFIRE:~$ mrsm stack [-d]
ERROR: Version in "./docker-compose.yaml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/
(mschm) casey@IRONFIRE:~$ docker-compose --version
docker-compose version 1.25.0, build unknown
Any info I can provide you to help debug this failure? I'm using the docker-compose and docker install that came with Pop OS 20.04. Your readme suggests installing docker-compose from pip but I have not done so: if you have a strict dependency on a particular docker-compose version you should maybe seek to keep that as a pip dependency, no?
Moreover, I think it's really important that the README give us a clear example of the usefulness of this tool. As written, it says the following:
Meerschaum is a platform for quickly creating and managing time-series data streams called Pipes.
Cool. Where can these time-series data streams come from? Where can they go to? I see some code for a SQL connector and you mentioned this was an ETL tool in our talk on Sunday, but are there already other existing sources and sinks?
I think the really critical thing is that, cool shell UI or cool web UI aside (I'd hold off on the web UI), once I have your package installed I have zero idea how to use it to extract, transform, or load data. Even the toyest of the toy examples would go a long way to showing how this tool is unique or useful.
Let's suppose I have... a sqlite3 db with a table full of student names, submission datetimes, and grades, and I'd like to extract it, apply some hashing function to the student name column, and dump out a csv of "anonymized" data. How?
(Is my supposition dumb? Is there a better scenario where this tool shines?)
Hey Casey, thanks for the detailed post. I can't believe it took this long to find out, but the code for taking input to yes/no questions was just broken. I hadn't tested it properly, so answering Y
still returned False 🤦. I guess this is a good example of why unit tests are important, and I need to get around to implementing them. It's definitely still an alpha release.
I've also changed the default version in the docker-compose.yaml file from 3.8
to 3
for better backwards compatibility.
Try upgrading to version 0.0.39 which should address these issues.
$ pip install --upgrade meerschaum
$ mrsm bootstrap config ### if this doesn't work, just rm -rf ~/.config/meerschaum
$ mrsm stack
To address the README question, the current use case for Meerschaum is helping non-system engineers spin up a pre-configured Grafana/TimescaleDB stack and migrate outside data into TimescaleDB (particularly in the case of utilities data).
The plan is to implement the bootstrap pipes
action so that all parameters can be easily set from one command. But for the time being, the following steps need to be taken:
In your example students
DB (say you have a table assignments
with columns submission_datetime, student_id, grade
), you would take the following steps to migrate into the TimescaleDB / Grafana stack:
mrsm edit config
. For sqlite, your config would look something like this:
meerschaum:
connectors:
sql:
studentdb: ### our label for the connection
flavor: sqlite
database: /path/to/students.sqlite
connector_keys
), (2) metric (metric_key
), and (3) location (location_key
). The location may be omitted, however, and will be for this example.The connector_keys
(-C
) are the type and label of the connector defined in step 1, so in our case the connector_keys
will be sql:studentdb
.
The metric
(-M
) is a label we give to identify the contents of the Pipe (think power, energy, CO2, temperature, etc.). In this case I'll use the original table name assignments
.
The location
is a label to describe a Pipe's location, and if omitted, it's None/NULL. This is a way to further partition data streams among buildings for example, where the parent database may contain an entire campus's worth of data, but we want to partition at the building level. The plan is to derive Pipes with location
as TimescaleDB continous / real-time views of the parent metric Pipe.
E.g. sql_studentdb_assignments
would be the parent for sql_students_assignments_thirdblock
(the :
in the connector keys is converted to _
for convenience).
$ mrsm register pipes -C sql:studentdb -M assignments
mrsm edit pipes
$ mrsm edit pipes -C sql:studentdb -M assignments
This will open your editor and allow you to edit the parameters of the Pipe. Your parameters would look something like this:
columns:
datetime: submission_datetime
id: student_id
fetch:
### This is the query which is executed on the remote host
definition: SELECT * FROM assignments
### How many minutes into the past to look for backlogged data (optional)
backtrack_minutes: 0
Here's one way to register a Pipe from a script instead (does the same as above):
>>> pipe = mrsm.Pipe('sql:studentdb', 'assignments')
>>> pipe.parameters = {
... 'columns' : {
... 'datetime' : 'submission_datetime',
... 'id' : 'student_id',
... 'fetch' : {
... 'definition' : 'SELECT * FROM assignments',
... }
... }
>>> pipe.register()
sync
logic, and you can see how the syncing works by inspecting the pipe.fetch()
method. In essence, it builds a SQL query to grab the latest data from the remote host, diffs it against its own recent data, and updates its source.
### this will sync all Pipes from the sql:studentdb connection
$ mrsm sync pipes -C sql:studentdb
I hope this wasn't too complicated. I think of it as a framework for non-system engineers who need to build visualizations for large sets of time-series data. There are a lot of pieces to get into, like how the API works and how Connectors work (SQLConnector vs APIConnector vs other types to come), but overall Pipes are a way to simplify many different connection types and organize data.
Thank you for taking interest in Meerschaum! It's exciting to answer GitHub issues and I hope I get more feedback in the future. If you have any further issues (and be warned, you'll probably run into a few bugs), feel free to reach out here or at my email bennett.meares@gmail.com or on Discord (bennett#0708). I'm still getting used to Mumble and Matrix, but I hope to join in on more LUP LUGs. Everyone was so nice, and I look forward to meeting more Linux nerds like me.
Hi Bennett! I see on your dev branch that you have a header for a quick start, but no content there. Want some help writing something here? What's a minimal usage of this package look like? :)