YAML configs for importing Canvas Data with Embulk
Essentially, these are provided as a starting point for your own workflow, to manage Canvas Data in YAML instead of in code.
Visit Managing Canvas Data with Embulk on the CanvasLMS Community for discussions and workflow ideas.
Embulk is an open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services. https://www.embulk.org/docs/
with support for
and features useful for Canvas Data
For more details see the Wiki docs
Embulk uses YAML config files for each task, for Canvas Data this means each input source (table files) and it's output destination (db table) is 1 file. This includes differences between staging, test and production destinations. 100 plus config files may seem like an odd workflow at first, but it's a lot less work to manage than generating DDLs with schema.json, and plugins are a lot easier than coding custom sorting and filtering tasks.
Embulk can recreate the whole table each time the config is run. This means editing the config file is your only edit, leaving
I will attempt to keep these configs up-to-date, tagged with each schema version so you can use them in your own workflow. However, it's unlikely I'll be able to maintain and test the configs for 4 databases regularly, nor can I see the data affected by each institutions use case of CanvasLMS. You may see scenarios, data, and values others have not. I'm currently using MS SQL Server, and would appreciate anyone using these configs to help maintain as Canvas Data changes.
If you use this repository, please consider submitting a Pull Request or Issue for the following:
- The Oracle configs are currently only setup for
insert_method: normal
and notoci
. OCI greatly improves the import speed. If you can help support this, please consider testing and documenting. https://github.com/embulk/embulk-output-jdbc/tree/master/embulk-output-oracle#insert-methods- Oracle has some compatibility and identifier length issues between versions, and I am currently not able to get the
after_load: indexes
working with my dev version, they do work directly in SQL editor.