dssg / triage

General Purpose Risk Modeling and Prediction Toolkit for Policy and Social Good Problems
Other
187 stars 61 forks source link

Dependency updates #835

Closed shaycrk closed 3 years ago

shaycrk commented 3 years ago

Updates PyYAML and mkdocs-material to satisfy safety-ci

However, note that for loading matrix metadata I'm switching from yaml.full_load(fd) to yaml.load(fd, Loader=yaml.Loader) which isn't ideal. Based on the errors I was getting, it seems that in order to stick with full_load, we would need to define representers and constructors specifically for the AsOfTimeList and FeatureNameList (see the example here). Note that doing so would lose the ability to load matrix metadata from previous versions of triage (though a utility to upgrade old metadata files probably wouldn't be too difficult to write).

@thcrock -- would appreciate your thoughts here.

thcrock commented 3 years ago

@shaycrk What is the cost of switching from full_load to yaml.load? What you said about representers and constructors makes sense to do to mark it as 'trusted' YAML, but I'm curious what state this PR is in. Does everything work? If so, what makes it worse than full_load?

shaycrk commented 3 years ago

Thanks @thcrock -- Everything should work here so long as we're fine with moving from full_load to load. As I understand it, load is basically an alias for unsafe_load and the main cost in making the move is just about security (specifically, that the format can be used to run arbitrary python code, but note that's also the case with full_load in 5.3, which is why they're recommending the update). On the one hand, the risk seems very low with our use-case since we don't, say, deploy triage as a web-facing app that runs arbitrary user input, though on the other hand it's probably good practice to plug the potential hole eventually in case someone takes triage and uses it in ways we don't expect down the road.

The cost of keeping full_load is somewhat more acute both in terms of writing the representers/constructors for pyYAML to recognize our custom objects and, probably more importantly, losing the ability to read older/existing matrices. I think given that (and what seems to be low risk), my inclination is towards switching to load here, but definitely wanted to gut-check that before merging!

shaycrk commented 3 years ago

Sounds good -- I'll go ahead and merge and create an issue to consider coming back to the load vs full_load question later.