Closed seeholza closed 9 months ago
π€
It's great to see your interest in improving Cosmos! Your observation about using the target-dir
with the manifest.json
in the command execution to potentially skip the re-parsing of the project for each task sounds like a promising approach.
I would encourage you to go ahead and create a pull request with your proposed solution. The maintainers will review it and provide feedback. Your contribution will not only benefit the community but also help enhance the functionality of Cosmos for everyone. Good luck, and thank you for considering to contribute!
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
I think this is essentially the same as #785; I think the runtime only uses the manifest if it detects no code changes, which it can only do with the msgpack
.
Let me know if you'd like to coordinate adding this change as I was going to get to it this weekend, but we could work together on it. The solution is if an existing target dir is found that the whole target dir should be moved into the tmpdir for the runtime.
Hi @dwreeves ,
Next to the job I don't have much time to work on this, but i am happy to help if i manage.
I would probably start by adding support for supplying the msgpack in dbt-manifest
mode, with a more generic API change in mind after. But we would need agree on the final design of how msgpack/manifest is copied to the tmp directories.
Do you have a branch already? I will close this issue in favor of #785 and we continue discussing there then.
Hi @flinz,
I opened a PR, yes: #800.
The way I implemented it is entirely agnostic to LoadMode.DBT_MANIFEST
. Basically, by default, it will always check for the existence of the partial parse file before parsing when running a command locally. So any user who is running dbt compile
in their CICD, whether they're using manifest or LS, will be fine. (a small subset of DBT_LS
users may mysteriously see their performance improve in the next release after this is implemented!).
Also, for what it is worth, I did test that dbt never uses manifest.json
for input purposes, only outputs. So when we say "use manifest for task execution" it's never manifest.json
, only ever partial_parse.msgpack
.
You raise a good point that #785 is actually two issues in one (one is the msgpack thing, the other is about execution of projects using artifacts from cloud storage). I made a note of that in #800 and I do believe that these should be separated.
I am using DBT_MANIFEST parsing mode, and seeing quite a bit of time being spent in each task on parsing the whole project, spending around 1/3 of the task total time on a full parse:
I see the command being executed for each task is along the lines of
I believe that providing the
target-dir
with themanifest.json
in the command execution would probably allow skipping the re-parsing of the project for each task.Related to this:
I am fairly new to cosmos, so I would be grateful for some pointers in the right direction.