Repository suggestions - Githubissues

gentrexha commented 2 years ago

Proposing some new suggestions to the dune-bridge, just things I think could be added/removed/done differently which we could discuss.

Discussion points:

Start using absolute imports instead of relative ones (Ref: https://realpython.com/absolute-vs-relative-python-imports/#pros-and-cons-of-relative-imports , https://stackoverflow.com/questions/14132789/relative-imports-for-the-billionth-time)
Encapsulate code inside of the main() function (Ref: https://docs.python.org/3/library/__main__.html#idiomatic-usage)
Parsing args in if __name__ == '__main__': instead ofmain`. I didn't find any specific documentation which is the suggested way, except some Stackoverflow questions (Ref: https://stackoverflow.com/a/14500228/3841083 , https://stackoverflow.com/a/42747728/3841083)
Replaced print() statements with logging statements.
More unit tests?

bh2smith commented 2 years ago

Hey @gentrexha this is great stuff! Here are my thoughts on each item:

Start using absolute imports instead of relative ones

Yes absolutely! I think the reason this happened is because, for some reason, my IDEs auto-fill imports is always using relative and I didn't take the time to look back and check. Would love to configure that differently.

Encapsulate code inside of the main() function

This is also totally fine. Generally the approach I have been taking is that, the script logic stays under __name__ == '__main__' until the content of this script becomes something that is used elsewhere (i.e. part of a bigger program). I didn't really see the need to move this logic into its own function until it is used elsewhere.

However, I am not fond of the function name main(since then we will have multiple occurrences of the same function names all over the project). In the past I have gone with a name more related to what the purpose of the file is all about. Although this has its downsides to (e.g. file called price_fetcher.pyand main method called get_pricesthen our import would look like from price_fetcher import get_prices -- I guess thats not so bad).

Parsing args in if name == 'main': instead of main`.

I do very much prefer the idea of parsing the args in __name__ == '__main__' because then we can give typed parameter declaration to the main method with the parsed arguments (rather than untyped kwargs).

Replaced print() statements with logging statements.

This is also an excellent idea! I think we just never got around to this. Also my approach to this generally has been the following:

If the program is something which runs continuously then it should be log statements (so we have timestamps), however, if it is just a script that is run once in a while and the text is purely informative then it doesn't need to be a log.

For example: a command prompt "do you want to continue? (Y/n)" doesn't really need to be a log. In any case, this particular project is something which runs in our staging and production environment and is not a stand alone script whose print statements are purely intended for the user, so our prints should be logs!

More unit tests?

For the most part I am 100% all in favour of more unit tests. However, in this particular project we are merely making use of duneapi calls from a library which is already quite well tested. There are very few places here where we actually do much more than fetch the results of a query and write them to file. If/when the logic here does get more involved, I think introducing a suite of tests would be great. There is an open issue about end to end testing https://github.com/cowprotocol/dune-bridge/issues/9.

Additional Suggestions

An improvement that was not mentioned here, but I think would be a great idea is the following:

Update the file format of user_data and distinct_app_data (json file output). Currently these files are extremely and unnecessarily redundant. For example

{
    "app_data": [
        {
            "appdata": "\"0x74a421fda6c8534f5641b412eba54bec2cf1764e845c324036edca640baf7c3c\""
        },
        {
            "appdata": "\"0xcb29f9a9c4a32b88f860b55af807040732d8c3730e400648a208c06d63695b97\""
        },
        {
            "appdata": "\"0xaae5b03329a392dac117a824a36dd9bbf2e26f869ee35c1ff5b92510cf2794ab\""
        }
    ],
    "time_of_download": 1657620250
}

Inside the app_data is a list of JSON with redundant keys. Furthermore, the appdata inside has a bunch of extra wonky characters "\" at the start and \"" at the end.

Since time_of_download is an additional key, we can't make this CSV, but we could turn this into something more compact like:

{
    "app_data": [
        "0x74a421fda6c8534f5641b412eba54bec2cf1764e845c324036edca640baf7c3c",
        "0xcb29f9a9c4a32b88f860b55af807040732d8c3730e400648a208c06d63695b97",
        "0xaae5b03329a392dac117a824a36dd9bbf2e26f869ee35c1ff5b92510cf2794ab"
    ],
    "time_of_download": 1657620250
}

To take this even further, we could encode the time_of_download in the filename and just use a CSV here!

Same issue with the user_data (whose filename already contains the time of download (this is an inconsistency).

{
    "user_data": [
        {
            "data": {
                "cowswap_usd_volume": 130520.55826558582,
                "day": "2022-04-10T00:00:00+00:00",
                "number_of_trades": 1,
                "owner": "0x00551b73fd95c6d01db4ec1b158a6e7b27bdc045",
                "referrals": [],
                "total_referred_volume": null,
                "usd_volume_all_exchanges": 0
            },
            "__typename": "get_result_template"
        },
        {
            "data": {
                "cowswap_usd_volume": 3477.7184040204756,
                "day": "2022-04-10T00:00:00+00:00",
                "number_of_trades": 1,
                "owner": "0x008f2efbaef8adf0af43ea009350ba6b572cf585",
                "referrals": [],
                "total_referred_volume": null,
                "usd_volume_all_exchanges": 0
            },
            "__typename": "get_result_template"
        },
        ...
    ],
    "time_of_download": SOME_UNIX_TIMESTAMP
}

Both of these would be a lot more compact if they were just CSVs.

Notice here, that there is a redundant field "__typename": "get_result_template" which is due to lazy parsing of dune response data. At the very least we could pull the data field content one level up.

bh2smith commented 2 years ago

Also, I am not super fond of the "default" logging configuration, perhaps we could use something more uniform across our other projects like here:

https://github.com/cowprotocol/solver-rewards/blob/main/logging.conf

gentrexha commented 2 years ago

I think that's a great idea!

bh2smith commented 2 years ago

Hey @gentrexha, this is becoming stale. It's been approved, you can merge any time (you'll probably have to update the branch).

gentrexha commented 2 years ago

Hey @gentrexha, this is becoming stale. It's been approved, you can merge any time (you'll probably have to update the branch).

Thanks for the reminder. Will do so today.

cowprotocol / dune-bridge

Repository suggestions #20

Additional Suggestions