esnet-security / SCRAM

Security Catch and Release Automation Manager
Other
5 stars 0 forks source link

Topic/soehlert/configurable payloads #33

Closed samoehlert closed 2 months ago

samoehlert commented 4 months ago

Passing along the information from websocket element to translator to allow for overriding ASN and BGP community. A few things I still want to work through:

samoehlert commented 4 months ago

Update: behave tests failing seems to be narrowed down to scram/route_manager/tests/acceptance/steps/translator.py as that's where it always fails with timeouts. The tests all pass locally on both my machine and @dopheide-esnet's which suggests to me maybe this is a resource issue with the gitlab runners? This might be the kick I need to move our CI testing jobs over to github actions finally.

Also, the pytests always seem to pass fine in GitlabCI which is extra confusing, haven't really made progress there.

grigorescu commented 3 months ago

I don't know if I love how this falls back to ESnet values that were previously hard coded

Suggest that you remove the ESnet-specific values entirely, set the default in the code to an ASN that's reserved for documentation use (64496 - 64511 per rfc5398), and override the correct value for ESnet. That would also help to ensure that your usage of SCRAM (i.e. overriding the value) is the same as any non-ESnet user's.

grigorescu commented 3 months ago

Update: behave tests failing seems to be narrowed down to scram/route_manager/tests/acceptance/steps/translator.py as that's where it always fails with timeouts. The tests all pass locally on both my machine and @dopheide-esnet's which suggests to me maybe this is a resource issue with the gitlab runners?

I think I was seeing something similar but never fully tracked it down. I think that GitLab isn't fully resetting the environment between runs.

grigorescu commented 3 months ago

The test failures are weird. Recommend splitting this up into two PRs:

1) pass event_data, but don't actually use it. That is, change translator.py as you did, but only change gobgp.py such that add_path and del_path take event_data as a parameter, and they log the contents, but keep the actual logic the same as what we have working in main today. 2) update the logic so that the event data is actually being used.

This way, the changes in the first PR should not affect the tests in any way and if we're still seeing failures, that's a problem with the testing rather than with this change.

soehlert commented 3 months ago

Took care of the ESnet specific variables and shortened the variable code.

As for the tests, I actually checked out the translator.py and gobgp.py files from main to test this and still saw the same failures. the only changes at that point in time vs main were the documentation files.

dopheide-esnet commented 3 months ago

Today I went back and started adding in these changes a few at a time in topic/dopheide/testing-ci. I added some fake event_data to translator/tests/acceptance/steps/actions.py to avoid a few failures, but now it appears I'm stuck in the same timeout place. This part fails...

    comm_id = (asn << 16) + community

But if i explicitly set the asn/community to simply integers, that's fine. So I actually think the problem lies in:

    asn = event_data.get("asn", 64500)
    community = event_data.get("community", 666)

Some kind of data type issue I'm not sure how to test through gitlab CI just yet.

grigorescu commented 3 months ago

I'm really confused by GoBGP's proto. It's a uint32:

https://github.com/osrg/gobgp/blob/7ef2f0bb8283d056149aea2cae407613167870eb/api/attribute.proto#L57

But if we use a 32-bit ASN, and add the community to it, it can't fit in a uint32?

grigorescu commented 3 months ago

I think maybe we're supposed to use an ExtendedCommunities attr instead:

And then either TwoOctetAsSpecificExtended or FourOctetAsSpecificExtended. But I'm confused because they seem to be the same?

Wonder if FourOctetAsSpecificExtended would just work with 16-bit ASNs as well, so we just move to that for both cases.

crankynetman commented 3 months ago

It seems like the only downside of using FourOctetAsSpecificExtended would just be that it might be a little confusing in the code to see it being referred to as a Four Octet (Four Tet, if you will) ASN when it is supposed to be a 2 octet ASN. Pretty minor nit overall though, otherwise I agree with your assessment that they're functionally identical aside from the type name.

grigorescu commented 3 months ago

Found someone that is doing it the "right" way, and has lovely comments too: https://github.com/Jamesits/pilot/blob/9044bf0cf807c5869a6eb631f7c61a8005f21523/pilot/gobgp_web/action.py#L70

samoehlert commented 2 months ago

Ok I do still see some test failures that are unrelated to the timeouts. Need to update the tests to accept/pass event_data

samoehlert commented 2 months ago

Test failures fixed as well. Should be good to review.