Closed samoehlert closed 2 months ago
Update: behave tests failing seems to be narrowed down to scram/route_manager/tests/acceptance/steps/translator.py
as that's where it always fails with timeouts. The tests all pass locally on both my machine and @dopheide-esnet's which suggests to me maybe this is a resource issue with the gitlab runners? This might be the kick I need to move our CI testing jobs over to github actions finally.
Also, the pytests always seem to pass fine in GitlabCI which is extra confusing, haven't really made progress there.
I don't know if I love how this falls back to ESnet values that were previously hard coded
Suggest that you remove the ESnet-specific values entirely, set the default in the code to an ASN that's reserved for documentation use (64496 - 64511 per rfc5398), and override the correct value for ESnet. That would also help to ensure that your usage of SCRAM (i.e. overriding the value) is the same as any non-ESnet user's.
Update: behave tests failing seems to be narrowed down to
scram/route_manager/tests/acceptance/steps/translator.py
as that's where it always fails with timeouts. The tests all pass locally on both my machine and @dopheide-esnet's which suggests to me maybe this is a resource issue with the gitlab runners?
I think I was seeing something similar but never fully tracked it down. I think that GitLab isn't fully resetting the environment between runs.
The test failures are weird. Recommend splitting this up into two PRs:
1) pass event_data
, but don't actually use it. That is, change translator.py as you did, but only change gobgp.py such that add_path
and del_path
take event_data
as a parameter, and they log the contents, but keep the actual logic the same as what we have working in main today.
2) update the logic so that the event data is actually being used.
This way, the changes in the first PR should not affect the tests in any way and if we're still seeing failures, that's a problem with the testing rather than with this change.
Took care of the ESnet specific variables and shortened the variable code.
As for the tests, I actually checked out the translator.py and gobgp.py files from main to test this and still saw the same failures. the only changes at that point in time vs main were the documentation files.
Today I went back and started adding in these changes a few at a time in topic/dopheide/testing-ci. I added some fake event_data to translator/tests/acceptance/steps/actions.py to avoid a few failures, but now it appears I'm stuck in the same timeout place. This part fails...
comm_id = (asn << 16) + community
But if i explicitly set the asn/community to simply integers, that's fine. So I actually think the problem lies in:
asn = event_data.get("asn", 64500)
community = event_data.get("community", 666)
Some kind of data type issue I'm not sure how to test through gitlab CI just yet.
I'm really confused by GoBGP's proto. It's a uint32:
https://github.com/osrg/gobgp/blob/7ef2f0bb8283d056149aea2cae407613167870eb/api/attribute.proto#L57
But if we use a 32-bit ASN, and add the community to it, it can't fit in a uint32?
I think maybe we're supposed to use an ExtendedCommunities attr instead:
And then either TwoOctetAsSpecificExtended or FourOctetAsSpecificExtended. But I'm confused because they seem to be the same?
Wonder if FourOctetAsSpecificExtended
would just work with 16-bit ASNs as well, so we just move to that for both cases.
It seems like the only downside of using FourOctetAsSpecificExtended
would just be that it might be a little confusing in the code to see it being referred to as a Four Octet (Four Tet, if you will) ASN when it is supposed to be a 2 octet ASN. Pretty minor nit overall though, otherwise I agree with your assessment that they're functionally identical aside from the type name.
Found someone that is doing it the "right" way, and has lovely comments too: https://github.com/Jamesits/pilot/blob/9044bf0cf807c5869a6eb631f7c61a8005f21523/pilot/gobgp_web/action.py#L70
Ok I do still see some test failures that are unrelated to the timeouts. Need to update the tests to accept/pass event_data
Test failures fixed as well. Should be good to review.
Passing along the information from websocket element to translator to allow for overriding ASN and BGP community. A few things I still want to work through: