SocialPass / socialpass

Hosting the next generation of events
https://registry.socialpass.io
Other
1 stars 0 forks source link

General issues with the codebase #979

Closed halfmoonui closed 7 months ago

halfmoonui commented 8 months ago

[!NOTE] This is only a discussion for now, nothing actionable as these things are not very pressing at this moment. We are not facing any performance bottlenecks on SocialPass. This is also mostly my opinion, so any feedback/comments/critique is appreciated. I could be wrong on any of these issues, and I'm happy to have my mind changed.

At this stage, SocialPass is a stable, mature product. Its list of features is close to the chief competitors on the market, and we have been able to successfully host quite a few events without any big hiccups. I think this is a good time to at least start a discussion about issues we've noticed in the codebase, with the goal of documenting and solving them for the following benefits:

Issue 1: Too many useless abstractions

This is obviously a weird one because we all know the importance of DRY in programming. However, in the SocialPass codebase, there are quite a few places where we abstract things in a way that brings no benefit to the readability or performance. In fact, I strongly feel it actively hurts the readability of the codebase. For example on the Event model:

class Event(DBModel):
    ...

    def clean_handle_google_event_class(self, *args, **kwargs):
        """
        clean the handling of the Google event class
        """
        # we do this only for LIVE events to reduce the number of API requests
        if self.state == Event.StateStatus.LIVE:
            google_event_class_id = self.handle_google_event_class()
            if not google_event_class_id:
                raise GoogleWalletAPIRequestError(
                    "Something went wrong while handling the Google event class."
                )

    def clean(self, *args, **kwargs):
        """
        clean method
        runs all clean_* methods
        """
        self.clean_handle_google_event_class()
        return super().clean(*args, **kwargs)

The clean_handle_google_event_class() method is called exactly in one place in the entire codebase (in the clean()) method. This is not really an abstraction. All it does is just make that block of code into a function and call it once. We would be better served with inlining that entire function in the clean() method and just add a line of comment on top of the block to signify what is happening. Why? Because inlined code is genuinely easier to read in most practical scenarios. This example is easy to read because the function is defined right on top of its calling block, but imagine this being defined in a separate file in a separate app. That's rough, and we have a lot of that too: functions that are called exactly in one place defined in a separate file.

I know a lot of this came from one particular engineer at Bix, and maybe this a valid design pattern. But now that we have a stability of product and features, we should make necessary changes to make the codebase more readable. I really like the Rule of Three where we only abstract something if it gets repeated thrice: https://lostechies.com/derickbailey/2012/10/31/abstraction-the-rule-of-three/

Issue 2: Unoptimized database models and querying

Take this TicketTier model and related objects as an example:

class TicketTier(DBModel):
    ...

    tier_fiat = models.OneToOneField(
        "TierFiat",
        on_delete=models.SET_NULL,
        blank=True,
        null=True,
    )
    tier_blockchain = models.OneToOneField(
        "TierBlockchain",
        on_delete=models.SET_NULL,
        blank=True,
        null=True,
    )
    tier_asset_ownership = models.OneToOneField(
        "TierAssetOwnership",
        on_delete=models.SET_NULL,
        blank=True,
        null=True,
    )
    tier_free = models.OneToOneField(
        "TierFree",
        on_delete=models.SET_NULL,
        blank=True,
        null=True,
    )

    ...

    @cached_property
    def tx_type(self):
        if self.tier_fiat:
            return CheckoutSession.TransactionType.FIAT
        elif self.tier_blockchain:
            return CheckoutSession.TransactionType.BLOCKCHAIN
        elif self.tier_asset_ownership:
            return CheckoutSession.TransactionType.ASSET_OWNERSHIP
        elif self.tier_free:
            return CheckoutSession.TransactionType.FREE

class TierFiat(DBModel):
    price_per_ticket = models.DecimalField(
        max_digits=19,
        decimal_places=2,
        validators=[MinValueValidator(0)],
        help_text="Price of one ticket for this tier.",
        blank=False,
        null=False,
    )

class TierBlockchain(DBModel):
    # Emtpy model

class TierAssetOwnership(DBModel):
    ...

    blockchain = models.CharField(
        max_length=50,
        choices=BlockchainChoices.choices,
        default=BlockchainChoices.ETH,
        blank=False,
    )
    network = models.IntegerField(
        choices=NetworkChoices.choices,
        default=NetworkChoices.ETH,
        blank=False,
        help_text="Which blockchain is your NFT collection on?",
    )
    asset_type = models.CharField(
        max_length=50,
        choices=AssetChoices.choices,
        default=AssetChoices.NFT,
        blank=False,
    )
    balance_required = models.IntegerField(
        default=1,
        blank=False,
        null=False,
        help_text="The number of NFTs required to claim your ticket tier.",
    )
    token_address = models.CharField(
        max_length=42,
        blank=False,
        default="",
        help_text="What is the contract address of your NFT collection?",
    )
    token_id = ArrayField(
        models.IntegerField(),
        null=True,
        blank=True,
        help_text="Which specific token IDs of the NFT collection are required?",
    )
    deprecated_issued_token_id = ArrayField(
        models.IntegerField(), blank=True, default=list
    )

class TierFree(DBModel):
    deprecated_issued_emails = ArrayField(models.EmailField(), blank=True, default=list)

The TierFree and TierBlockchain models are entirely useless (no fields or only deprecated fields). They are only used as Boolean checks to find out what type of TicketTier the parent object is. The TierFiat only stores a price. The main issue I have with setups like these is that in order to get anything useful from objects, we need to query an extra 2-3 objects every time. This is evident from our views, where we use select_related and prefetch_related everywhere. For example, on the tx_type property, you will notice that we need to query for the 4 extra objects in order to get the type of TicketTier.

Some of this is obviously unavoidable: we need a robust database table structure to make everything possible. However, we should also put in some thoughts into some easy optimizations. For example:

On the TicketTier model, when an object is created, why not save the tx_type in a field? During create, we run a pre-save signal to find out the type and store it in a field. This type is never ever going to change. So the next time we need the tx_type of a TicketTier, we don't need to query for 4 other Tier* tables.

Another example:

class CheckoutSession(DBModel):

    ...

    @property
    def total_price(self):
        if self.tx_fiat:
            total_price = 0
            checkout_items = CheckoutItem.objects.select_related(
                "ticket_tier", "ticket_tier__tier_fiat"
            ).filter(checkout_session=self)
            for item in checkout_items:
                tier_price = item.ticket_tier.tier_fiat.price_per_ticket
                total_price += item.quantity * tier_price
            return total_price
        else:
            return "N/A"

We should possibly save the total_price in a field during create as it's not going to change. We don't allow customers to change checkout sessions (which expire anyway). We let them go back and create a new one.


[!IMPORTANT] My main point here is that when I'm developing a new feature for SocialPass, one of the most common complaint is that I don't have immediate access to the data fields that I need, especially ones I need on the UI. I have to go back and create more properties, or add more objects to my query set which always makes me feel a little uneasy. Some of this is unavoidable, but some could be solved with more thought out database models.

The great thing is that I hope that with Django 5, generated fields and #975 will solve a lot of these issues. However, we should also put in some thought to solving the easy performance wins.

crypto-rizzo commented 8 months ago

I think #975 (Where applicable) should be covered in this issue as well.. Here are some general notes going through the codebase

Migrate from Properties to GeneratedField

(https://docs.djangoproject.com/en/5.0/releases/5.0/#database-generated-model-field)

Minor

Also some good resources on 5.0

crypto-rizzo commented 8 months ago

In general this issue should focus less on optimizations and moreso on DX