datamade / how-to

📚 Doing all sorts of things, the DataMade way
MIT License
88 stars 12 forks source link

Research options for reusable logic to load Django fixtures in bulk #131

Open hancush opened 3 years ago

hancush commented 3 years ago

Background

Django offers a nice loaddata management command, but one downside is that it does not offer bulk creation of objects, so it can be quite slow.

We've implemented a few data pipelines now that do some light extraction and transformation, then use the bulk_create method to insert the fixture data into the database.

This is a nice pattern that we might consider making reusable. I'm curious, too, whether there are any Django plugins that facilitate bulk model instance creation.

Proposal

Deliverables

This R&D effort will yield increased understanding of data loading considerations, as well as reusable code to facilitate bulk loading of data into our applications (though, that could take one of several forms, listed in the third point, above).

Timeline

I'd expect research to take about half a day, and then adopting an existing tool or writing fresh code to take between one and three days, depending on which avenue we take.

FWIW, I'd also say this is low priority, just wanted to capture a conversation I had with @jeancochrane before they leave us.