IACR / latex-submit

Web server to receive uploaded LaTeX and execute it in a docker container.
GNU Affero General Public License v3.0
11 stars 0 forks source link

Migration to pydantic 2.0 #43

Closed kmccurley closed 1 year ago

kmccurley commented 1 year ago

The existing code has dependencies on the pydantic python package, in order to use JSON serialization and validation of models (specifically the compilation). Unfortunately the maintainers of this code have created incompatibilities with the old 1.0 code, so that things like parse_raw no longer work. This is mostly just an annoyance, but it means we have to be tied to a given minimum version of python and the pydantic package. The only thing we really use is serialization to and from json, so I'm tempted to remove the pydantic stuff and instead just use a json serializer/deserializer. The options seem to be:

  1. freeze at pydantic 1.10
  2. update to pydantic 2.0
  3. switch to dataclasses with JSON serialization for StrEnum and datetime.
  4. switch to attrs+cattrs
kmccurley commented 1 year ago

This is turning out to be extremely complicated. If we freeze at pydantic 1.10 then we will eventually regret it. Developers always want you to move to new versions, so we are just postponing the problem for a year. If we update to pydantic 2.0 then we are buying into a very complicated dependency, but at least it has many many users (it's the basis of FastAPI). I tried to switch to attrs+cattrs, but that project seems to have its own difficulties.

One fundamental question is: what are we using pydantic for?

  1. Using classes rather than just dicts. It's nicer to say author.name instead of author['name']. In this case we need to perform various boilerplate jobs like create an __init__ method for the class. Things like dataclasses can simplify this, but there is also attrs and pydantic to achieve this.
  2. We need serialization of a python object to json so we can store it in files and databases. There are various ways to do this like dataclasses-json for dataclasses. pydantic gives this for free, though they have changed the API in v2.0.
  3. validation of the Compilation and Meta classes to make sure we don't create one that can't be used. For example, if we ever created a Compilation without status it would mean we'd have to check for the existence of the status field wherever we used it. That would be a pain and there is no reasonable excuse to have a Compilation without a status. Similarly we need name in Author, but we don't need email for every author.
kmccurley commented 1 year ago

I managed to convert to pydantic 2.0, but it was painful. We may revisit removing pydantic in the future.