etsy / boundary-layer

Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform
Apache License 2.0
262 stars 58 forks source link

Migrate to Marshmallow 3 #57

Open aksswami opened 4 years ago

aksswami commented 4 years ago

Migration to marshmallow 3, specifically 3.6.1 Also removed python27 env from tox.

I am able to run all test successfully except below two test related to Oozie workflow. As I have very little understanding of Oozie workflow, not able to resolve these two. Any help would be really appreciated.

test/oozier/test_oozie_converter.py::test_workflow_parser FAILED
test/oozier/test_oozie_converter.py::test_workflow_parser_with_prune_forks FAILED

As boundary-layer master is for Python 2.7, and if you don't have plan to move to Python 3 as of now maybe we can create another branch for python3 changes and merge these changes to that.

mchalek commented 4 years ago

wow, thanks @aksswami for this contribution! I will try to get around to taking a look at it within the next few days. Just wanted to let you know we haven't missed and/or forgotten about this :)

aksswami commented 4 years ago

Great! It will be good to update the dependencies to latest stable. Let me know if there any other. task, I can help with.

mchalek commented 4 years ago

OK @aksswami I finally got around to looking at this 😅 . The Oozie-related test errors you experienced were due to the fact that marshmallow 2.x permits unrecognized fields by default, whereas 3.x rejects unrecognized fields by default.

If you add the following to the OozieBaseSchema class, it should fix the tests:

    class Meta:
        unknown = ma.INCLUDE

I think it's a great idea to upgrade to ma3 and I really appreciate you doing all of the heavy lifting here! Personally, I am fine with cutting off python 2 support --- we can start a new minor version for this, and we can always back-port any additional fixes as necessary to the 1.7 trunk, which we can keep around on a separate branch somewhere. But as you know there is at least a little risk in doing that, and in addition, with these updates there are some other code cleanups we can do: for example, we can eliminate the StrictSchema which we use as the base class for the non-oozie-related portions of the code, because this is now the default in marshmallow. We can presumably also address the deprecation warnings around the use of abc, and other python3-specific improvements.

So, how about this: If you could make the change above so that the tests pass, we can then merge this PR into a branch to represent the start of boundary-layer version 1.8. Then maybe we can do any other cleanup we would like to do, we can verify the new version internally at Etsy against more workflows (to increase chances of hitting edge cases or whatever), and cut the 1.8.0 release sometime soon?

aksswami commented 4 years ago

@mchalek Thanks for detailed explanation. I will cleanup this PR for Ozzie related error. And will try to create a new PR for other changes.

Yes, moving to Python3 will require more changes. So this PR can be start of those changes. Thanks again.

aksswami commented 4 years ago

Resolved the OzzieBaseSchema test issue.

For reference:- https://marshmallow.readthedocs.io/en/3.0/quickstart.html#handling-unknown-fields

aksswami commented 4 years ago

Do we need to disable 2.7 CI test? Can you help me resolve the failing checks? @mchalek

mchalek commented 4 years ago

Thanks @aksswami for making the requested changes! Yes, let's disable the 2.7 check. You should be able to do that by editing line 16 of the file .github/workflows/python-app.yml. If you wouldn't mind doing that, hopefully that's all you'll need! I will change this PR now to target a new branch for boundary-layer v1.8.

aksswami commented 4 years ago

@mchalek Removed python 2.7 run from the test pipeline workflow.

aksswami commented 4 years ago

@mchalek Is there anything else required for this PR to be merge ready? Please let me know.

dossett commented 4 years ago

Thanks @aksswami! Kevin and I talked offline and I will merge this PR soon (perhaps directly to the main branch) based on how some final py2 -> py3 migration work goes internally for us.

Thanks so much for this contribution!

kconvey commented 4 years ago

Is it possible to know what the status is of this PR?

dossett commented 4 years ago

Hi @kconvey (and everyone else following this PR). The good news is that we (Etsy) have done everything we need to do to cut loose of python2. The bad news is that my kids are back in school (distance learning in my city) so my capacity to drive some work like this is pretty limited. I hope to be able to work on it in the next couple of weeks. (The work that's outside of this PR is to make sure that we can upgrade marshmallow the same way in some internal codebases that would otherwise conflict with this change.)

kconvey commented 4 years ago

Thanks for the update! Glad to hear that this hasn't been forgotten about :)

hanli-0612 commented 2 years ago

Is it possible to get some updates on this PR? It'd be great if we could support marshmallow 3.