ccsd / canvas-data-embulk-configs

YAML configs for importing Canvas Data with Embulk
0 stars 1 forks source link
canvas-data canvas-lms embulk

canvas-data-embulk-configs

YAML configs for importing Canvas Data with Embulk

Essentially, these are provided as a starting point for your own workflow, to manage Canvas Data in YAML instead of in code.

Visit Managing Canvas Data with Embulk on the CanvasLMS Community for discussions and workflow ideas.

canvas data v4.2.5

Embulk

Embulk is an open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services. https://www.embulk.org/docs/

with support for

and features useful for Canvas Data

For more details see the Wiki docs

Config files?

Embulk uses YAML config files for each task, for Canvas Data this means each input source (table files) and it's output destination (db table) is 1 file. This includes differences between staging, test and production destinations. 100 plus config files may seem like an odd workflow at first, but it's a lot less work to manage than generating DDLs with schema.json, and plugins are a lot easier than coding custom sorting and filtering tasks.

Embulk can recreate the whole table each time the config is run. This means editing the config file is your only edit, leaving

Maintainers

I will attempt to keep these configs up-to-date, tagged with each schema version so you can use them in your own workflow. However, it's unlikely I'll be able to maintain and test the configs for 4 databases regularly, nor can I see the data affected by each institutions use case of CanvasLMS. You may see scenarios, data, and values others have not. I'm currently using MS SQL Server, and would appreciate anyone using these configs to help maintain as Canvas Data changes.

Contributing

If you use this repository, please consider submitting a Pull Request or Issue for the following:

  • The Oracle configs are currently only setup for insert_method: normal and not oci. OCI greatly improves the import speed. If you can help support this, please consider testing and documenting. https://github.com/embulk/embulk-output-jdbc/tree/master/embulk-output-oracle#insert-methods
  • Oracle has some compatibility and identifier length issues between versions, and I am currently not able to get the after_load: indexes working with my dev version, they do work directly in SQL editor.