JupiterBroadcasting / show-scraper

Scraper written in python to convert episodes hosted on Fireside or jupiterbroadcasting.com into Hugo Markdown files
5 stars 5 forks source link

Very strange behavior from pydantic_yaml... #21

Open elreydetoda opened 2 years ago

elreydetoda commented 2 years ago

So, this is so strange...I've gotten things to work with loading the people & sponsors as markdown files, but it's working in the weirdest way

So, whenever I load the pydantic_yaml library along side another library frontmatter (which is really cool and compliments the show scrapper well since we're creating posts), everything works. But I'm not even using the pydantic_yaml library at all, I'm just loading one of their models...that's it...it's like whenever I do that transparently it modifies the behavior of pydantic types and makes them interpretable by the frontmatter's yaml parser (essentially converting them to string instead of HttpUrl types) even though the pydantic_yaml library isn't used anywhere in our project...

I'm guessing that it's happening because of something in this line since it's called hacky_stuff, and it's really cool but very odd...

I've recorded my screen to show how it works and then doesn't work, with the only different being me uncommenting that line...very odd stuff but it's working :sweat_smile:

asciicast

kbondarev commented 2 years ago

How about not using pydantic-yaml at all it seems like a hacky library and might actually not be necessary at all. I know I was the one to suggest it originally, but I never used that before. And now that I think about it, it's a completely unnecessary dependency...

Instead we could simply convert the Pydantic model into a dictionary first, and then using PyYaml convert it to a yaml string.

Here's a YAMLBaseModel class that can be extended from and will have a yaml() method:

# yaml_basemodel.py
from typing import TYPE_CHECKING, Union

import yaml
from pydantic import BaseModel

if TYPE_CHECKING:
    from pydantic import AbstractSetIntStr, MappingIntStrAny

class YAMLBaseModel(BaseModel):
    def yaml(
        self,
        *,
        include: Union['AbstractSetIntStr', 'MappingIntStrAny'] = None,
        exclude: Union['AbstractSetIntStr', 'MappingIntStrAny'] = None,
        by_alias: bool = False,
        skip_defaults: bool = None,
        exclude_unset: bool = False,
        exclude_defaults: bool = False,
        exclude_none: bool = False,
        default_flow_style: bool = False,
    ) -> str:
        """
        Generate a yaml representation of the model, optionally specifying which fields to include or exclude.

        This uses the underlying `self.dict()` method and `yaml.dump()` from `PyYAML`
        """

        _dict = self.dict(
            include=include,
            exclude=exclude,
            by_alias=by_alias,
            skip_defaults=skip_defaults,
            exclude_unset=exclude_unset,
            exclude_defaults=exclude_defaults,
            exclude_none=exclude_none
        )

        return yaml.dump(_dict, default_flow_style=default_flow_style)

Then just use it for all the models instead of original pydantic BaseModel:

-class Episode(BaseModel):
+class Episode(YAMLBaseModel):

And just do this to get the yaml string:

episode = Episode(...)

episode.yaml()   # :rocket: 
elreydetoda commented 2 years ago

I imagine we might be able to, but I'm sure there is a reason why they do it like that (maybe @NowanIlfideme could shed some light on why they have that hacky_stuff line?). I can try that sometime tonight though and see how it goes.

The other thing would be that we'd end up loosing the ability to use the Post object from the python-frontmatter library, so we'd have to handle the conversion for that as well.

Also, it seems like the .yaml parser (dump(s)) is the reason why it's failing in the first place (from my video, looks to be around this time), because converting it to a dict() doesn't convert the HttpUrl type to a string inherently. Verses the pydantic-yaml does coerce it to a string when it converts.