Add -o option to only use a specific source for an AS-SET or Route Set

Main Feature

This PR adds a new CLI option (-o) to specify that only specific data sources should be used for AS-SET and Route-Set expansion. Those data sources are specified using the "::" notation e.g., "RADB::AS-GOOGLE". This is to resolve #5, specifically this implements "Part 1" of the idea mentioned in this comment (a separate PR could follow at a later date for "Part 2").

A user may already specify one or more data sources using the -S option. In the case that -S is used, the given data sources are saved as the default data source. If -S isn't used, the irrd server is queried for it's list of sources and these are saved as the default data source.

When using the new -o option e.g., bgpq4 -o RIPE::AS-MCKAY with no -S option, RIPE is set to the data source in this example, then a query is sent to irrd to perform a non-recursive expansion of AS-MCKAY, and then for each member of the AS-SET, one of two things happens;

The member is in the form RADB::AS-GOOGLE, in this case bgpq4 sets the irrd data source to RADB then bgpq4 requests irrd to make a non-recursive expansion of AS-GOOGLE
The member is in the form AS-VOSTRON without a specific data source, in this case bgpq4 sets the irrd data source to the contents of the default source list mentioned above, and then bgpq4 requests irrd to make a non-recursive expansion of AS-VOSTRON

This continues down the tree, honouring any sources found along the way using the "::" syntax, and using the default data sources when no specific source is given.

Other stuff

There are quite a few things that could be improve but opening a massive PR which changes lots of things isn't helpful. These are the other things this PR fixes "along the way" in order to get the main feature working nicely:

There are missing close() calls for file descriptors used, I have added some close(fd) calls.
In the following places the program stops what it is doing and returns or exits without any error message:

https://github.com/bgp/bgpq4/blob/main/expander.c#L200 https://github.com/bgp/bgpq4/blob/main/expander.c#L203 https://github.com/bgp/bgpq4/blob/main/expander.c#L734 https://github.com/bgp/bgpq4/blob/main/expander.c#L844 https://github.com/bgp/bgpq4/blob/main/expander.c#L850

I have added error messages to these locations so that they aren't silent errors.

In some places bgpq4 calls err() when there is an error or issue, in others it calls exit(), and in other places it calls sx_report(SX_FATAL, ...) and then exit(1) which is confusing because sx_report(SX_FATAL, ...) calls exit(-1) so the extra exit with a different return code is confusing to debug:

https://github.com/bgp/bgpq4/blob/main/expander.c#L402 https://github.com/bgp/bgpq4/blob/main/expander.c#L745 https://github.com/bgp/bgpq4/blob/main/expander.c#L811 https://github.com/bgp/bgpq4/blob/main/expander.c#L814

I have removed exit() calls after a fatal error.

Aligned the return values between calls.

Inside expander.c the following functions return 1 on success and 0 on failure/error:

int bgpq_expander_init()
int bgpq_expander_add_asset()
int bgpq_expander_add_rset()
static int bgpq_expander_add_already()
int bgpq_expander_add_stop()
int bgpq_expander_add_as()
int bgpq_expander_add_prefix()
static int bgpq_expanded_macro()
static int bgpq_expanded_macro_limit()
static int bgpq_expanded_prefix()
static int bgpq_expanded_v6prefix()

These two functions return 0 on success and failure:

static int bgpq_read()
int bgpq_expand_irrd()

The are the two worker functions in expander.c, the former for pipeline mode and later for non-pipeline mode. I have changed these last two functions to return 1 on success and 0 on failure/error when there is an error inside these function, so that there is consistent success/error status codes across all functions in expander.c.

At no point where any of the above functions are called, is the return status actually checked. Also the status of the callback functions used by bgpq_read() and bgpq_expand_irrd() isn't checked. Also, the "main" function in expander.c called int bgpq_expand() is hard coded to return 1. bgpq4 checks the return status of this function in main(), but because the function returns 1 (and for this function 1 means "success"), bgpq4 always exits with exit-status 0 even though there are in fact errors occurring (unless the error is right at the start like parsing the CLI args/opts).

I have added checks to the calls used by the new -o option in expander.c, so that the return status is passed through each function to the final exit status of bgpq4 (from callback function -> to bgpq_read()/bgpq_expand_irrd() -> to bgpq_expand() -> to main().

The combination of points 2, 3 and 4 above is that now when I run bgpq4 -o RADB::AS-GOOGLE, bgpq4 generates as much of the prefix list as it can, but due to some missing data in irrd an error is printed and I have a non-zero exit status. So I can bring up a peering session with AS-GOOGLE AND I know I need to tell them they have a problem with their IRR data.

Without this, which is how bgpq4 currently operates, users are not told that a specific source wasn't valid for example. This is an example: https://github.com/bgp/bgpq4/blob/main/expander.c#L683

$./bgpq4 -S RIPE AS-MCKAY
no ip prefix-list NN
ip prefix-list NN permit 37.1.88.0/21
ip prefix-list NN permit 37.1.94.0/24
ip prefix-list NN permit 77.87.48.0/21
ip prefix-list NN permit 77.87.48.0/24
ip prefix-list NN permit 77.87.49.0/24
ip prefix-list NN permit 77.87.50.0/23
ip prefix-list NN permit 77.87.50.0/24
ip prefix-list NN permit 77.87.51.0/24
ip prefix-list NN permit 185.128.248.0/22
ip prefix-list NN permit 185.128.249.0/24
ip prefix-list NN permit 185.241.44.0/22
ip prefix-list NN permit 185.241.44.0/23
ip prefix-list NN permit 185.241.46.0/24
ip prefix-list NN permit 193.107.204.0/22
ip prefix-list NN permit 193.107.205.0/24

$echo $?
0

AS-MCKAY is in RIPE, it contains an AS-SET in ARIN though, so prefixes are missing from this prefix set, but the exist status is 0. I know this is kind of by design, but I think it's a bit miss-leading for the user, there should at least be an error to say "one of the members of AS-MCKAY was not found in the the RIPE data".

This is from my PR:

$./bgpq4 -S RIPE -o RIPE::AS-MCKAY
ERROR:Key not found expanding !iAS-HELIACAL

ERROR:Key not found expanding !gas26679

no ip prefix-list NN
ip prefix-list NN permit 37.1.88.0/21
ip prefix-list NN permit 37.1.94.0/24
ip prefix-list NN permit 77.87.48.0/21
ip prefix-list NN permit 77.87.48.0/24
ip prefix-list NN permit 77.87.49.0/24
ip prefix-list NN permit 77.87.50.0/23
ip prefix-list NN permit 77.87.50.0/24
ip prefix-list NN permit 77.87.51.0/24
ip prefix-list NN permit 185.128.248.0/22
ip prefix-list NN permit 185.128.249.0/24
ip prefix-list NN permit 185.241.44.0/22
ip prefix-list NN permit 185.241.44.0/23
ip prefix-list NN permit 185.241.46.0/24
ip prefix-list NN permit 193.107.204.0/22
ip prefix-list NN permit 193.107.205.0/24

$echo $?
1

This hasn't changed the default behaviour of bgpq4, but when using the new -o option an error message is at least shown to the user and the exit status is non-zero so if they are using bgpq4 in an automated way and logging the output, they can check their logs to at least get a clue that something needs a deeper look.

Testing

I've tried to test this as thoroughly as I can on the CLI with all different combinations. I'm happy it's working. We are now loading this binary onto our prefix generation server, which generates prefix lists towards all our peers and customers every night, so we will also do some production testing too.

Example: Generate the prefix-list for a specific peer and specify the source for the peers' AS-SET or Route-Set (prefix-list excluded with grep for brevity). Here we can see there are errors, but the prefix list is generated for everything that is possible (just as before), the difference being that AS-GOOGLE was specifically picked from RADB, but downstream members used the irrd servers default data sources (which is all data sources):

$./bgpq4 -o RADB::AS-GOOGLE | grep -v "prefix"
ERROR:Key not found expanding !iAS-MEEBO

ERROR:Key not found expanding !gas13949

ERROR:Key not found expanding !gas22577

ERROR:Key not found expanding !gas36039

ERROR:Key not found expanding !gas36561

ERROR:Key not found expanding !gas40873

ERROR:Key not found expanding !gas55023

$echo $?
1

Example: Thrown an error when the AS-SET isn't available from the specified data source:

$./bgpq4 -o AFRINIC::AS-GOOGLE 
ERROR:Key not found expanding !iAS-GOOGLE

no ip prefix-list NN
! generated prefix-list NN is empty
ip prefix-list NN deny 0.0.0.0/0

$echo $?
1

Example: Specify the data sources to use for all member objects, the AS-SET specified on the CLI which will only use RIPE:

$./bgpq4 -S RIPE,ARIN -o RIPE::AS-MCKAY
no ip prefix-list NN
ip prefix-list NN permit 37.1.88.0/21
ip prefix-list NN permit 37.1.94.0/24
ip prefix-list NN permit 77.87.48.0/21
ip prefix-list NN permit 77.87.48.0/24
ip prefix-list NN permit 77.87.49.0/24
ip prefix-list NN permit 77.87.50.0/23
ip prefix-list NN permit 77.87.50.0/24
ip prefix-list NN permit 77.87.51.0/24
ip prefix-list NN permit 185.128.248.0/22
ip prefix-list NN permit 185.128.249.0/24
ip prefix-list NN permit 185.241.44.0/22
ip prefix-list NN permit 185.241.44.0/23
ip prefix-list NN permit 185.241.46.0/24
ip prefix-list NN permit 192.64.152.0/22
ip prefix-list NN permit 193.107.204.0/22
ip prefix-list NN permit 193.107.205.0/24
ip prefix-list NN permit 199.189.204.0/22

$echo $?
0

Quick question, why not make this feature available via -S command line argument? Did you see a clash in functionality of sorts?

Quick question, why not make this feature available via -S command line argument? Did you see a clash in functionality of sorts?

I didn't want to change existing behavior in case it breaks some existing automation for some users.

Great work!

I have one question:

Shouldn't this behavior be the new default as opposed to make this a non-default optional?

The output of bgpq4 AS-MCKAY changes before/after this PR only if the as-set contains references in the SOURCE:: format. It would not change the behavior otherwise. Of course bgpq4 RIPE::AS-MCKAY behavior would be different.

I think changing the default behavior into something more secure and more inline what we would actually expect (honoring the SOURCE:: format) would a good thing.

I'm actually more afraid what the non-zero exit codes and stderr output does for complex automation in a lot of cases (the example with ./bgpq4 -o RADB::AS-GOOGLE | grep -v "prefix").

I think there are two different goals here:

goal 1 : honor SOURCE::
goal 2 : using non-zero exit code and stderr output to notify about missing child as-set's

In my opinion those are two different goals and while related, I would rather not make one big change and put them behind the same optional flag.

I'd make goal 1 default, frankly. However I'd certainly not marry goal 2 with goal 1, because it would hamper the adoption of goal 1.

If we do this, people will either:

not use -o (uuh, the new "improvement" in bgpq4 triggers the pagers at 02 AM, no time, rolling back ...)
ignore exit code and redirect stderr output to /dev/null

I mean here are a few networks (using peeringdb's IRR as-set/route-set), some bigger, some smaller, even some content that trigger errors. And I did not search long at all:

lukas@dev:~$ jwbensley-bgpq4/bgpq4 -o RADB::AS-HURRICANE 2>&1 >/dev/null | grep "ERROR" | wc -l
232
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -o AS-CORE-BACKBONE 2>&1 >/dev/null | grep "ERROR" | wc -l
1641
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -o RIPE::AS41327:AS-CUSTOMERS 2>&1 >/dev/null | grep "ERROR" | wc -l
601
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -o AS-MEER 2>&1 >/dev/null | grep "ERROR" | wc -l
813
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -o AS-AKAMAI 2>&1 >/dev/null | grep "ERROR" | wc -l
4
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -o AS-CLOUDFLARE 2>&1 >/dev/null | grep "ERROR" | wc -l
7
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -o AS-MICROSOFT 2>&1 >/dev/null | grep "ERROR" | wc -l
2
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -o AS-PROLE 2>&1 >/dev/null | grep "ERROR" | wc -l
11
lukas@dev:~$

With IPv6 it gets a lot worse (which is pretty much expected since not every IPv4 network is running an IPv6 network):

lukas@dev:~$ jwbensley-bgpq4/bgpq4 -6 -o AS-AKAMAI 2>&1 >/dev/null | grep "ERROR" | wc -l
5
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -6 -o AS-CLOUDFLARE 2>&1 >/dev/null | grep "ERROR" | wc -l
389
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -6 -o AS-MICROSOFT 2>&1 >/dev/null | grep "ERROR" | wc -l
11
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -6 -o AS-PROLE 2>&1 >/dev/null | grep "ERROR" | wc -l
777
lukas@dev:~$

So the amount of errors in real life on an typical Internet exchange, even for as-set's of content networks (I'm not even talking the obvious worse case: large Tier 2 networks) would just be mind boggling to the point where there is no way other then to dump it to /dev/null.

At least for now.

Do we want to actually output errors to stderr to improve RPSL accuracy? Probably, but that's a real long term goal, while honoring SOURCE:: should be a short time goal.

So the discussion should probably be must goal 1 be optional or can we just honor SOURCE:: always. What is the use-case for not honoring it (other than not getting bombarded with thousands of errors due to goal 2)?

small technical nitpicks:

error lines print 2 newlines
there is no -o explanation in --help (probably expected since this is still a draft)

@lukastribus Thanks for the in-depth review!

Regarding two goals

So yeah I had a similar thought to you, there are kind of two goals in this PR which could be separated. The reason I didn't split them is because when specifying a specific data source for an AS-SET or route set, and that data source doesn't exist, then we need to throw and error and return non-zero exist status. What would be the point in being able to specify a data source for an object if we don't throw an error when the data source doesn't exist?

Also, I can foresee quite a bit of discussion. Should we all just jump on a Teams call / Zoom meeting / ? (I'm on central European time)

Regarding making -o the default behavior

I already mentioned that I didn't want to break existing implementations due to the change in exit status codes. But there is another reason I forgot to mention. If you re-read this comment, bgpq4 normally uses a recursive !a query or !i,1 query to irrd. In order to allow every single member in an AS-SET to specify it's own source, bgpq4 has to send an !s query + a non-recursive !i query for every member in the AS-SET or route set. So in the most extreme case (if everyone used "::" notation in the IRR DBs) this would doubling the number of operations we send to the irrd server, which is "inconsiderate" perhaps? The idea of this PR is that we have a live issue that is affecting daily operations right now, this PR helps us resolve that issue now, and anyone with the same issue could use -o, but in the long term we should also raise a PR for irrd to add a new query type which will honour the source "::" of each member in a recursive query (!a or !i,1).

(We are discussing writing/raising this PR for irrd internally, and then one for bgpq4 to use that new query type, but it would be in a few weeks/or couple of months, and we need a fix sooner, hence this bgpq4 PR).

What would be the point in being able to specify a data source for an object if we don't throw an error when the data source doesn't exist?

I think the point is to get data from the correct SOURCE::, avoiding compromised filters and therefor avoid things like the AS-GOOGLE incident which is basically a Denial of Service.

Unresolved dependencies on the other hand is a different beast and is not new at all. bgpq3/4 always ignored unresolved dependencies #1, because that is how filters are created (use correct data to permit, and reject everything else). Honoring SOURCE:: doesn't change that, SOURCE can already be restricted today and unresolved dependencies already exist today (a helluvalot actually). I think it is important to have this discussion, I just suggest to have this discussion separately and unrelated to the SOURCE:: change. I don't think there is a security issue or a new attack that is solved by handling this differently.

lukas@dev:~$ bgpq4/bgpq4 -S RIPE AS-HURRICANE
no ip prefix-list NN
! generated prefix-list NN is empty
ip prefix-list NN deny 0.0.0.0/0
lukas@dev:~$ echo $?
0
lukas@dev:~$

What is the operational consequence here? What is the operator supposed to do specifically with the 778 errors from generating v6 filters for Akamai Prolexic, 389 errors for Cloudflare, or over 9000 errors for HE?

lukas@dev:~$ jwbensley-bgpq4/bgpq4 -6 -o AS-PROLE 2>&1 >/dev/null | grep "ERROR" | wc -l
778
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -6 -o AS-CLOUDFLARE 2>&1 >/dev/null | grep "ERROR" | wc -l
389
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -6 -o RADB::AS-HURRICANE 2>&1 >/dev/null | grep "ERROR" | wc -l
9312
lukas@dev:~$

I think this error output belongs to verbose mode.

Unresolved dependencies are very normal in RPSL, we would basically return non-zero exit codes and stderr all the time.

Should the operator ignore the filter output on stdout if bgpq4 returns non-zero? Probably, but then filter creation would not even work for FAANG companies...

Now we have different exit codes, based on whether we specify -o , which imho should be orthogonal to the SOURCE:: handling.

lukas@dev:~$ jwbensley-bgpq4/bgpq4 -S RIPE AS-HURRICANE
ERROR:Key not found expanding !a4AS-HURRICANE

no ip prefix-list NN
! generated prefix-list NN is empty
ip prefix-list NN deny 0.0.0.0/0
lukas@dev:~$ echo $?
0
lukas@dev:~$ jwbensley-bgpq4/bgpq4 -S RIPE -o AS-HURRICANE
ERROR:Key not found expanding !iAS-HURRICANE

no ip prefix-list NN
! generated prefix-list NN is empty
ip prefix-list NN deny 0.0.0.0/0
lukas@dev:~$ echo $?
1
lukas@dev:~$

I just don't see how this would actually work in practice, without ignoring stderr and return code.

I'd like to talk about how operators actually use return codes and stderr output in real life scenarios on an Internet Exchanges while peering with different networks. What do we do with tens of thousands of errors on every bgpq4 run? What do we do with basically omni present non-zero return codes?

this would doubling the number of operations we send to the irrd server, which is "inconsiderate" perhaps

Thanks for clarifying. I'd predict a very gradual load increase though, considering that new codes paths in bgpq4 and especially SOURCE:: references in as-set are not something that ship worldwide at the same time and with 100% coverage, but maybe this should be discussed with IRR operators.

I had a phone call with James, we agreed to make :: trick available via the default UI path

I am traveling this weekend so I will refactor the PR then.

Just to confirm:

I shall make the :: behavior default for the object entered on the CLI.
Currently the PR also looks for child members which use :: because next step on my to-do list is to approach the RIPE DB WG chairs and discuss what it would take to add support for the :: notation to the members field on an AS-SET in the DB. Shall I leave this in or take it out? I put it in originally to save me having to add it later.
I will leave in the additional error messages I added, but revert back to the hard coded exist status of 0. I think it is useful to have an audit log (i.e. why are some prefixes missing from my generated prefix set? I can look in my stderr logs to fine out).

Ok thanks.

Great.
Imho: leave it in. It would be better to find a way to test this of course, but in the worst case scenario this would need a follow up fix, which is the better option than splitting the work up and postponing it into a possibly distant future.
Alright, that seems like a good compromise. People shouldn't be shoving stderr output straight into their routers and this keeps the door open to implement future non-zero exit code by not burning this bridge now.

Just one request: can you fix the double newline after each error in stderr? Those empty lines in stderr are not pretty.

lukas@dev:~$ jwbensley-bgpq4/bgpq4 -6 -o AS-MICROSOFT >/dev/null
ERROR:Key not found expanding !6as8069

ERROR:Key not found expanding !6as8073

ERROR:Key not found expanding !6as20046

ERROR:Key not found expanding !6as20049

ERROR:Key not found expanding !6as20366

ERROR:Key not found expanding !6as23468

ERROR:Key not found expanding !6as30427

ERROR:Key not found expanding !6as62540

ERROR:Key not found expanding !6as132406

ERROR:Key not found expanding !6as197612

ERROR:Key not found expanding !6as198097

lukas@dev:~$

Thank you James!

@lukastribus Just one request: can you fix the double newline after each error in stderr? yeah sorry I didn't mention that, for me it was implicit that I would fix it ;)

To save re-reading the whole thread, this is the current state of the PR:

The "::" notation is now supported by default for AS-SETs
I copy and pasted the same logic and applied it to Route-Sets too, so they also support the "::" notation now by default (in hindsight I'm not sure why I didn't do this before)

It would be trivial at a later date to add support for the "::" notation for ASNs too, so that an AS-SET can have members like "RIPE:AS1122334455". First we need to get the syntax of the IRR DB "members" field updated to support the "::" notation (that's next on my to-do list).

It seems there is a bug for hierarchical as-set names such as AS15562:AS-SNIJDERS:

feather$ ./bgpq4 -d RIPE::AS15562:AS-SNIJDERS
DEBUG: expander.c:1065 bgpq_expand Acquired sendbuf of 2097152 bytes
DEBUG: expander.c:1086 bgpq_expand Sending '!!' to server to request for the connection to remain open
DEBUG: expander.c:1096 bgpq_expand b->identify: Sending '!n bgpq4 1.6' to server.
DEBUG: expander.c:1109 bgpq_expand Got answer C
DEBUG: expander.c:1126 bgpq_expand Testing support for A queries
DEBUG: expander.c:1136 bgpq_expand Server supports A query
DEBUG: expander.c:446 bgpq_get_irrd_sources Requesting source list !s-lc
DEBUG: expander.c:457 bgpq_get_irrd_sources Got answer A138
NTTCOM,INTERNAL,LACNIC,RADB,RIPE,RIPE-NONAUTH,ALTDB,BELL,LEVEL3,APNIC,JPIRR,ARIN,BBOI,TC,AFRINIC,IDNIC,ARIN-WHOIS,RPKI,REGISTROBR,CANARIE
C
DEBUG: expander.c:548 bgpq_pipeline expander: sending !sRIPE
DEBUG: expander.c:548 bgpq_pipeline expander: sending !iAS-SNIJDERS
DEBUG: expander.c:548 bgpq_pipeline expander: sending !sNTTCOM,INTERNAL,LACNIC,RADB,RIPE,RIPE-NONAUTH,ALTDB,BELL,LEVEL3,APNIC,JPIRR,ARIN,BBOI,TC,AFRINIC,IDNIC,ARIN-WHOIS,RPKI,REGISTROBR,CANARIE
DEBUG: expander.c:829 bgpq_read No data expanding !sRIPE
DEBUG: expander.c:548 bgpq_pipeline expander: sending !sNTTCOM,INTERNAL,LACNIC,RADB,RIPE,RIPE-NONAUTH,ALTDB,BELL,LEVEL3,APNIC,JPIRR,ARIN,BBOI,TC,AFRINIC,IDNIC,ARIN-WHOIS,RPKI,REGISTROBR,CANARIE
DEBUG: expander.c:548 bgpq_pipeline expander: sending !iAS-SNIJDERS
DEBUG: expander.c:829 bgpq_read No data expanding !sNTTCOM,INTERNAL,LACNIC,RADB,RIPE,RIPE-NONAUTH,ALTDB,BELL,LEVEL3,APNIC,JPIRR,ARIN,BBOI,TC,AFRINIC,IDNIC,ARIN-WHOIS,RPKI,REGISTROBR,CANARIE
DEBUG: expander.c:829 bgpq_read No data expanding !sNTTCOM,INTERNAL,LACNIC,RADB,RIPE,RIPE-NONAUTH,ALTDB,BELL,LEVEL3,APNIC,JPIRR,ARIN,BBOI,TC,AFRINIC,IDNIC,ARIN-WHOIS,RPKI,REGISTROBR,CANARIE
no ip prefix-list NN
! generated prefix-list NN is empty
ip prefix-list NN deny 0.0.0.0/0

@job hmm, this was working for me, I will test again.

Fixed now...

These are all working for me (please check for your self):

./bgpq4 AS12654
./bgpq4 AS12654:RS-RIS
./bgpq4 RIPE::AS12654:RS-RIS
./bgpq4 AS15562
./bgpq4 AS-SNIJDERS
./bgpq4 AS15562:AS-SNIJDERS
./bgpq4 RIPE::AS15562:AS-SNIJDERS

./bgpq4 RIPE::AS12654 is not supported yes as I mentioned further up in the thread, but it could be added in a subsequent PR easily enough.

I think we are almost there, there seems to be a difference, compare:

feather$ ./bgpq4 RADB::AS-AMAZON | sha256
ERROR:Key not found expanding !gas10124
ERROR:Key not found expanding !gas17493
ERROR:Key not found expanding !gas39111
099329ac8b0c6880addaba336c7411e866c0a4fc7e0c9036bb336696624aa180

with

feather$ ./bgpq4 AS-AMAZON | sha256
099329ac8b0c6880addaba336c7411e866c0a4fc7e0c9036bb336696624aa180

is this expected or desirable?

This was discussed further up this thread, but this thread is getting a bit long so I'll recap here for you:

The SHA256 hash is confirming that the stdout output is the same (the generated prefix list). I generated a bunch of different prefixes with the original bgpq4 and this PR version and diff'ed them and found no differences too.

The difference in your example output is that currently bgpq4 doesn't produce any error messages to stderr when there are problems, so I added some error message which should have already been there (in my opinion):

This should be an error message not a debug message because a problem has occured. It is changed to an error message in the PR.
Here, no messages are produced (error or debug) when there is a problem. This is also changed in the PR.

OK, but why are there different error messages when using this PR version of bgpq4 against the same AS-SET: /bgpq4 RADB::AS-AMAZON vs ./bgpq4 AS-AMAZON ?

When I run the current/published version $ ./bgpq4 -4 AS-AMAZON, bgpq4 has a hard coded exit status of 0 and no error messages are displayed. This leads the user to falsely believe that they generated a full prefix list for all ASNs in the AS-SET tree for AS-AMAZON. This is not the case though. There are currently errors occurring, this could be for a number of different reasons (it could be that an AS-SET contains a member that doesn't exist, or the member name contains a typo, or it could be a problem with IRRd missing some data).

In both the existing version of bgpq4 and this PR version of bgpq4, when NOT using the :: notation a single recursive !a query is run or a single recursive !i,1 query, and IRRd provides a single response with the entire tree fully expanded. There may have been errors during that expansion process, but IRRd provides no indication of this, it returned "something", even if it was partial, so that is considered a success.

When using the :: notation in the PR version, we can't use a recursive query, which means a !i query is run for every member in the tree. This allows IRRd to provide "feedback", that it couldn't expand the single member being queried, which provides bgpq4 the opportunity to then log something to stderr, so that the user knows there was a problem and can look into fixing the issue.

These added error messages statements could be...

Removed
Changed to be debug statements so they don't show by default
Left as-is so provide some feedback

What do you prefer?

bgp / bgpq4