Closed gherkins closed 3 months ago
What QUIC stack is Solana TPU using? This library is based on Cloudflare's quic and it follows a particular bootstrapping process. Best way to debug this it also build and run the Solana TPU side, so you can see what why it is closing with 306 error.
On the otherhand, we have tests/utils.ts
. You can see randomBytes
utility function there, we use that for our testing, as prefer less nodeisms.
Thank you, I tried the randomBytes implementation as in test/utils.ts
,
but that seems not to change much.
import {QUICClient} from "@matrixai/quic";
import * as peculiarWebcrypto from '@peculiar/webcrypto';
const webcrypto = new peculiarWebcrypto.Crypto();
async function randomBytes(data: ArrayBuffer) {
webcrypto.getRandomValues(new Uint8Array(data));
}
const client = await QUICClient.createQUICClient({
host: HOST_IP,
port: parseInt(PORT),
config: {
verifyPeer: false,
},
crypto: {
ops: {
randomBytes
},
},
})
const clientStream = client.connection.newStream();
const writer = clientStream.writable.getWriter()
await writer.write(BUFFER);
await writer.close();
What QUIC stack is Solana TPU using? This library is based on Cloudflare's quic and it follows a particular bootstrapping process. Best way to debug this it also build and run the Solana TPU side, so you can see what why it is closing with 306 error.
I don't have enough in-depth understanding of the TPU server, yet. So I just assumed that quic communication would be rather universally?
Running TPU the server unfortunately is above my possibilities for the moment, but I that wrapped in try/catch block which truncated the full error message, which is:
ErrorQUICConnectionPeerTLS: Peer closed with transport code 306
at constructor_.send [...]/node_modules/@matrixai/quic/src/QUICConnection.ts:947:23)
at constructor_.send ([...]/node_modules/@matrixai/async-init/src/StartStop.ts:174:20)
at [...]/node_modules/@matrixai/quic/src/QUICConnection.ts:833:18
at [...]/node_modules/@matrixai/async-locks/src/Lock.ts:57:63
at withF ([...]/node_modules/@matrixai/resources/src/utils.ts:24:18)
at async constructor_.recv ([...]/node_modules/@matrixai/quic/src/QUICConnection.ts:749:5)
at async Socket.handleSocketMessage ([...]/node_modules/@matrixai/quic/src/QUICSocket.ts:119:7) {
data: {
isApp: false,
errorCode: 306,
reason: Uint8Array(50) [
114, 101, 99, 101, 105, 118, 101, 100, 32,
99, 111, 114, 114, 117, 112, 116, 32, 109,
101, 115, 115, 97, 103, 101, 32, 111, 102,
32, 116, 121, 112, 101, 32, 73, 110, 118,
97, 108, 105, 100, 83, 101, 114, 118, 101,
114, 78, 97, 109, 101
]
},
cause: undefined,
timestamp: 2024-04-09T07:03:42.048Z
}
Cheers
The reason message is received corrupt message of type InvalidServerName
. Maybe the server is expecting a client certificate?
You can provide a key and certificate as part of the QUICConfig
when starting the client.
/**
* Private key as a PEM string or Uint8Array buffer containing PEM formatted
* key. You can pass multiple keys. The number of keys must match the number
* of certs. Each key must be associated to the the corresponding cert chain.
*
* Currently multiple key and certificate chains is not supported.
*/
key?: string | Array<string> | Uint8Array | Array<Uint8Array>;
/**
* X.509 certificate chain in PEM format or Uint8Array buffer containing
* PEM formatted certificate chain. Each string or Uint8Array is a
* certificate chain in subject to issuer order. Multiple certificate chains
* can be passed. The number of certificate chains must match the number of
* keys. Each certificate chain must be associated to the corresponding key.
*
* Currently multiple key and certificate chains is not supported.
*/
cert?: string | Array<string> | Uint8Array | Array<Uint8Array>;
Look at a QUICServer
example for how to do this.
You can provide a key and certificate as part of the QUICConfig when starting the client.
Unfortunately that does not change anything, connection seems to be established, then fails.
const tlsConfig = await generateTLSConfig('RSA');
const client = await QUICClient.createQUICClient({
host: tpu_address.split(':')[0],
port: parseInt(tpu_address.split(':')[1]),
config: {
key: tlsConfig.leafKeyPairPEM.privateKey,
cert: tlsConfig.leafCertPEM,
},
crypto: {
ops: {
randomBytes
},
},
}
)
INFO:QUICClient:Create QUICClient to 141.98.216.83:8009
INFO:QUICSocket:Start QUICSocket on [::]:0
INFO:QUICSocket:Started QUICSocket on [::]:64177
INFO:QUICConnection d13269a7a249aac1a6efc3f2a44e6e433491c161:Connect QUICConnection
INFO:QUICConnection d13269a7a249aac1a6efc3f2a44e6e433491c161:Start QUICConnection
INFO:QUICClient:ErrorQUICConnectionPeerTLS: QUIC Connection local TLS error - Peer closed with transport code 306
INFO:QUICConnection d13269a7a249aac1a6efc3f2a44e6e433491c161:ErrorQUICConnectionPeerTLS: QUIC Connection local TLS error - Peer closed with transport code 306
INFO:QUICConnection d13269a7a249aac1a6efc3f2a44e6e433491c161:ErrorQUICConnectionPeerTLS: QUIC Connection local TLS error - Peer closed with transport code 306
INFO:QUICSocket:Stop QUICSocket on [::]:64177
INFO:QUICSocket:Stopped QUICSocket on [::]:64177
INFO:QUICClient:Destroy QUICClient
INFO:QUICClient:Destroyed QUICClient
ErrorQUICConnectionPeerTLS: Peer closed with transport code 306
at constructor_.send ([...]/node_modules/@matrixai/quic/src/QUICConnection.ts:947:23)
at constructor_.send ([...]/node_modules/@matrixai/async-init/src/StartStop.ts:174:20)
at [...]/node_modules/@matrixai/quic/src/QUICConnection.ts:833:18
at [...]/node_modules/@matrixai/async-locks/src/Lock.ts:57:63
at withF ([...]/node_modules/@matrixai/resources/src/utils.ts:24:18)
at async constructor_.recv ([...]/node_modules/@matrixai/quic/src/QUICConnection.ts:749:5)
at async Socket.handleSocketMessage ([...]/node_modules/@matrixai/quic/src/QUICSocket.ts:119:7) {
data: {
isApp: false,
errorCode: 306,
reason: Uint8Array(50) [
114, 101, 99, 101, 105, 118, 101, 100, 32,
99, 111, 114, 114, 117, 112, 116, 32, 109,
101, 115, 115, 97, 103, 101, 32, 111, 102,
32, 116, 121, 112, 101, 32, 73, 110, 118,
97, 108, 105, 100, 83, 101, 114, 118, 101,
114, 78, 97, 109, 101
]
},
cause: undefined,
timestamp: 2024-04-09T07:54:08.325Z
}
I mean you have to provide the appropriate client certificate - not just any certificate.
Based on what I can tell, this isn't a problem with the protocol. With QUIC the connection is established before TLS handshaking completes. So if the server rejects the TLS for whatever reason the client will 'establish' and then close with a code and message like you demonstrated. In this case it's a 306 error indicating DecodeError
with the message received corrupt message of type InvalidServerName
. This means that the server is rejecting the connection because of some requirement it has about the server name.
So it's important to note, whatever the problem is, it's the server taking issue with the connection, likely due to the certificate. I can't really find any information about how it uses QUIC while doing some quick research so I can't really comment on what exactly it could be.
Thank you very much for your help! I'm also struggling to find some information about what would be an appropriate client certificate in that context. I thought providing any certificate might change the error message, but that didn't do much.
I will post an update here, if I find some solution. Closing for now - thanks again!
Happy to help.
@gherkins the appropriate client certificate would depend on your target Solana TPU node expects. TLS certificates are supposed to be signed by an authority. Perhaps there is a an authority that you need to get your certificate signed by in order to the target server to accept your credentials? This is a core part of MTLS connections - and represents a sort of end to end identity check. Client checks server identity, server checks client identity. You'd have to ask about this wherever Solana TPUs exist.
hm, seems like in the rust implementation it's done via x509 certificates 🤔
The TPU node is just one of those retrieved via connection.getClusterNodes()
ContactInfo has an undocumented (in the ts library) property tpuQuic
, which is just HOST-IP:PORT
I guess I'll try a self signed x509 certificate next and see if this changes the error message 🤷♂️
There may actually be a signed authority required for all client certificates, best to ask someone on the solana team if that's required. These TPU nodes are not "public" nodes are they? (As in intended for just anybody to connect to).
Hi all,
Did some digging and found something that might be useful.
In the solana quic-client when the connection is made, there is a server_name string "connect"
/// Connect to a remote endpoint
///
/// `server_name` must be covered by the certificate presented by the server. This prevents a
/// connection from being intercepted by an attacker with a valid certificate for some other
/// server.
///
/// May fail immediately due to configuration errors, or in the future if the connection could
/// not be established.
I think this is related to the InvalidServerName
The reason message is
received corrupt message of type InvalidServerName
. Maybe the server is expecting a client certificate?
https://docs.rs/quinn/latest/src/quinn/endpoint.rs.html#161 https://docs.rs/solana-quic-client/1.18.11/src/solana_quic_client/nonblocking/quic_client.rs.html#182
The server_name
is being used here https://docs.rs/quinn-proto/0.10.6/src/quinn_proto/endpoint.rs.html#419
The EndpointConfig::default() config
which is being used here as config.client.start_session()
has a reset_key length of 64 HMAC_SHA256
Hope this helps @gherkins excited to see this work in the tpu-client 👍
Cheers @lmvdz, that looks interesting indeed.
Although I'm more sold on the idea that it's about a valid client certificate atm. Mainly because I played around with the rust implementation, where you can basically just do something like this:
// endpoint being the interesting part here
let endpoint = Arc::new(QuicLazyInitializedEndpoint::default());
// as server_addr is really just the address, without any certificate handling,
// i guess that's rather done internally...
let server_addr = SocketAddr::new(
IpAddr::V4(
Ipv4Addr::new(
127,
127,
127,
0
)
),
8888
);
//seems to be needed, really just a structure to hold some values about failed/succeeded
let connection_stats = Arc::new(ConnectionCacheStats::default());
let client_connection = QuicClientConnection::new(endpoint, server_addr, connection_stats);
// as this does send data to the server...
match client_connection.send_data_async(buffer) {
Ok(res) => dbg!(res) ,
Err(error) => panic!("error sending data: {:?}", error.unwrap()),
};
Now QuicLazyInitializedEndpoint::default()
seems to just yield some kind of anonymous self-signed client cert, doesn't it?
impl Default for QuicLazyInitializedEndpoint {
fn default() -> Self {
let (cert, priv_key) =
new_self_signed_tls_certificate(&Keypair::new(), IpAddr::V4(Ipv4Addr::UNSPECIFIED))
.expect("Failed to create QUIC client certificate");
Self::new(
Arc::new(QuicClientCertificate {
certificate: cert,
key: priv_key,
}),
None,
)
}
}
My best guess would be, that we need to emulate just that in the JS implementation as js-quic
does take those arguments for the client side, too (cert
& key
).
So maybe that attempt here https://github.com/MatrixAI/js-quic/issues/98#issuecomment-2044402927 wasn't that far off.. I'll try some more certificate variations, when I have some time on my hands.
The QUICConnection
does support passing in serverName
parameter - it's propagated to quiche's connect https://docs.quic.tech/quiche/fn.connect.html.
Do note that we have a special verifyCallback
option that overrides the native TLS check - this is used for our purposes in Polykey, as we require a custom TLS verification procedure - compared to standard MTLS or HTTPS based connections. If Solana is not using a custom TLS procedure, you should not be using the verifyCallback
option.
But my question still stands. These nodes you are connection to - are they meant to be "public" nodes? The need for client authentication usually indicates that they are not "public" nodes, as that implies the need for an authority to sign client certificates in some way. If they are public nodes, they would not bother to verify client certificates. If this is true, then you need to ask permission by whoever "owns" those nodes you are connecting to to sign your certificate - this is basically what MTLS is intended to do.
Hey, yes - so from my understanding those node are totally meant to be publicly available.
They're exposed by the RPC connection via getClusterNodes
and it's encouraged in the official docs to send transactions there directly: https://solana.com/de/docs/core/transactions/retry#the-journey-of-a-transaction
The problem seemed to be only that node/js doesn't have an out-of-the-box quic implementation as rust does.
I would therefor also assume that you would not need client authentication, but the problem seems to be somewhat certificate related anyway 🤷♂️
Can you do a sanity check on connecting to those nodes - by attempting an establishment with quiche directly then? Please note the bootstrapping protocol - but they have rust samples.
If they are intended to be public nodes I do not understand why they would require client certificates. Unless there's a special reason. Which should be explicit in their documentation.
Actually... What is the valid server name? Is it the hostname or something else? When you connect sometimes this is important.
It’s hardcoded as “server”
On Wed, Apr 24, 2024 at 6:46 PM Roger Qiu @.***> wrote:
Actually... What is the valid server name? Is it the hostname or something else? When you connect sometimes this is important.
— Reply to this email directly, view it on GitHub https://github.com/MatrixAI/js-quic/issues/98#issuecomment-2076055981, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQUFPYB6WZOPTP2IDFDSEDY7A743AVCNFSM6AAAAABF563Z4OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZWGA2TKOJYGE . You are receiving this because you were mentioned.Message ID: @.***>
Taken from this useful post
TLS connections on the web would typically also use this X.509 certificate to associate an external identity, like a domain name (e.g. forum.solana.com), as well as a signature chain vouching for the certificate’s validity. Solana validators, however, are inherently identified by their identity public key. There is no need to associate this key with external information. Consequently, there is no need for these X.509 certificates any signature chain nor any other pieces of data other than the public key itself. Notably, validators also have the ability to treat peers as “anonymous” and ignore their identity. This works because the message content is often authenticated by itself, regardless who is the sender. (Such as a gossip message)
It seems like current validator nodes just use self-signed dummy x509 certificates, so I am not sure why this code doesn't work...
You can provide a key and certificate as part of the QUICConfig when starting the client.
Unfortunately that does not change anything, connection seems to be established, then fails.
const tlsConfig = await generateTLSConfig('RSA'); const client = await QUICClient.createQUICClient({ host: tpu_address.split(':')[0], port: parseInt(tpu_address.split(':')[1]), config: { key: tlsConfig.leafKeyPairPEM.privateKey, cert: tlsConfig.leafCertPEM, }, crypto: { ops: { randomBytes }, }, } )
INFO:QUICClient:Create QUICClient to 141.98.216.83:8009 INFO:QUICSocket:Start QUICSocket on [::]:0 INFO:QUICSocket:Started QUICSocket on [::]:64177 INFO:QUICConnection d13269a7a249aac1a6efc3f2a44e6e433491c161:Connect QUICConnection INFO:QUICConnection d13269a7a249aac1a6efc3f2a44e6e433491c161:Start QUICConnection INFO:QUICClient:ErrorQUICConnectionPeerTLS: QUIC Connection local TLS error - Peer closed with transport code 306 INFO:QUICConnection d13269a7a249aac1a6efc3f2a44e6e433491c161:ErrorQUICConnectionPeerTLS: QUIC Connection local TLS error - Peer closed with transport code 306 INFO:QUICConnection d13269a7a249aac1a6efc3f2a44e6e433491c161:ErrorQUICConnectionPeerTLS: QUIC Connection local TLS error - Peer closed with transport code 306 INFO:QUICSocket:Stop QUICSocket on [::]:64177 INFO:QUICSocket:Stopped QUICSocket on [::]:64177 INFO:QUICClient:Destroy QUICClient INFO:QUICClient:Destroyed QUICClient ErrorQUICConnectionPeerTLS: Peer closed with transport code 306 at constructor_.send ([...]/node_modules/@matrixai/quic/src/QUICConnection.ts:947:23) at constructor_.send ([...]/node_modules/@matrixai/async-init/src/StartStop.ts:174:20) at [...]/node_modules/@matrixai/quic/src/QUICConnection.ts:833:18 at [...]/node_modules/@matrixai/async-locks/src/Lock.ts:57:63 at withF ([...]/node_modules/@matrixai/resources/src/utils.ts:24:18) at async constructor_.recv ([...]/node_modules/@matrixai/quic/src/QUICConnection.ts:749:5) at async Socket.handleSocketMessage ([...]/node_modules/@matrixai/quic/src/QUICSocket.ts:119:7) { data: { isApp: false, errorCode: 306, reason: Uint8Array(50) [ 114, 101, 99, 101, 105, 118, 101, 100, 32, 99, 111, 114, 114, 117, 112, 116, 32, 109, 101, 115, 115, 97, 103, 101, 32, 111, 102, 32, 116, 121, 112, 101, 32, 73, 110, 118, 97, 108, 105, 100, 83, 101, 114, 118, 101, 114, 78, 97, 109, 101 ] }, cause: undefined, timestamp: 2024-04-09T07:54:08.325Z }
Also, see this from a non-official project which connects to the solana validators:
/// takes a validator identity and creates a new QUIC client; appears as staked peer to TPU
// note: ATM the provided identity might or might not be a valid validator keypair
async fn new_endpoint_with_validator_identity(validator_identity: ValidatorIdentity) -> Endpoint {
info!(
"Setup TPU Quic stable connection with validator identity {} ...",
validator_identity
);
// the counterpart of this function is get_remote_pubkey+get_pubkey_from_tls_certificate
let (certificate, key) = new_self_signed_tls_certificate(
&validator_identity.get_keypair_for_tls(),
IpAddr::V4(Ipv4Addr::new(0, 0, 0, 0)),
)
.expect("Failed to initialize QUIC connection certificates");
create_tpu_client_endpoint(certificate, key)
}
It generates a self-signed TLS certificate using the new_self_signed_tls_certifcate()
function (source code) using the validator_identity
which is just a keypair (ie: it doesn't matter if its a staked or unstaked node's keypair, so anyone should be able to connect)
The impression I'm getting is that the serverName
parameter that quiche connect takes needs to be set to something different.
It was mentioned it was hard coded to server
? Well internally the QUICClient
sets it to the provided host
parameter.
As a sanity check you can try monkey-patching the QUICClient
code to set the serverName
to server
to see if that works?
hm, just setting serverName
in https://github.com/MatrixAI/js-quic/blob/staging/src/QUICClient.ts#L189 to "server" as a hardcoded string, produces another error code 376 ("peer doesn't support any known protocol")
Changing it to any random string, on the other hand, keeps producing the original error code (306 etc)
AH!, that's progress. It means we've moved on to a new problem. So it seems that setting serverName to server
works. peer doesn't support any known protocol
should mean that you didn't include the expected protocol in the config. https://github.com/MatrixAI/js-quic/blob/94f38390a3f667a829460c330157fbcc5a27a0c1/src/types.ts#L295-L305 You'll need to include at least 1 common protocol that the server supports. Otherwise the connection will be rejected.
Looking at the source code, the protocol is set to pub const ALPN_TPU_PROTOCOL_ID: &[u8] = b"solana-tpu";
https://docs.rs/solana-streamer/latest/src/solana_streamer/nonblocking/quic.rs.html#63. You'll need to add this to the applicationProtos
array in the config for the client.
@tegefaulkes I tried adding "solana-tpu"
in the applicationProtos
array and I am now getting a new error (304): Failed connection due to native TLS verification
Note that I am generating my own key-certificate pair as so:
import { generateKeyPairSync } from "crypto";
const keypair = generateKeyPairSync('ed25519', {
privateKeyEncoding: { format: 'pem', type: 'pkcs8' },
publicKeyEncoding: { format: 'pem', type: 'spki' }
});
And passing that into the config of the quic client:
config: {
key: keypair.privateKey,
cert: keypair.publicKey,
applicationProtos: ["solana-tpu"]
}
The serverName is also hardcoded to be server
, as discussed previously.
More progress, this one is harder to say though. Can you give the full error as it's printed out?
Seems like a problem with the certificate/key pair
Failed to send transaction to TPU ErrorQUICConnectionLocalTLS: Failed connection due to native TLS verification
at /home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICConnection.ts:791:18
at /home/alex/projects/tpu-client/node_modules/@matrixai/async-locks/src/Lock.ts:57:63
... 2 lines matching cause stack trace ...
at async Socket.handleSocketMessage (/home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICSocket.ts:119:7) {
data: { isApp: false, errorCode: 304, reason: Uint8Array(0) [] },
cause: Error: TlsFail
at /home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICConnection.ts:765:19
at /home/alex/projects/tpu-client/node_modules/@matrixai/async-locks/src/Lock.ts:57:63
at withF (/home/alex/projects/tpu-client/node_modules/@matrixai/resources/src/utils.ts:24:18)
at async constructor_.recv (/home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICConnection.ts:749:5)
at async Socket.handleSocketMessage (/home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICSocket.ts:119:7) {
code: 'GenericFailure'
},
timestamp: 2024-04-29T21:01:34.150Z
}
Oh I see, It's the client failing the server's certificate now. That will happen after the connection has been established. Quic is just annoying like that.
Good new, there are two options here.
Polykey
where certificates are self signed and we verify them based on their NodeId
. You can see an example of it here https://github.com/MatrixAI/Polykey/blob/79ee0888a82fbbd6898a8fcda50aa80d33c54c2a/src/nodes/NodeConnection.ts#L242-L257. I think the verifyCallback
will return a Promise<undefined>
if verification should succeed.runnning with
config: {
applicationProtos: ['solana-tpu'],
verifyPeer: false,
},
now gives me
INFO:QUICClient:ErrorQUICConnectionPeer: QUIC Connection peer error - Peer closed with application code 1 ErrorQUICStreamInternal: Failed to prime local stream state with a 0-length message
not sure if that's progress... 🤔
Yeah, I am seeing the same error as @gherkins. Here's the full error log:
Failed to send transaction to TPU ErrorQUICStreamInternal: Failed to prime local stream state with a 0-length message
at new QUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICStream.ts:298:17)
at new QUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/events/src/Evented.ts:55:9)
... 4 lines matching cause stack trace ...
at /home/alex/projects/tpu-client/src/index.ts:324:64 {
data: {},
cause: Error: StreamLimit
at new QUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICStream.ts:291:25)
at new QUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/events/src/Evented.ts:55:9)
at new QUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/async-init/src/CreateDestroy.ts:49:26)
at Function.createQUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICStream.ts:76:20)
at constructor_.newStream (/home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICConnection.ts:1160:35)
at constructor_.newStream (/home/alex/projects/tpu-client/node_modules/@matrixai/async-init/src/StartStop.ts:244:18)
at /home/alex/projects/tpu-client/src/index.ts:324:64 {
code: 'GenericFailure'
},
timestamp: 2024-04-30T18:36:31.390Z
We seem to be running into the stream limit now, this defaults to 100. Try modifying some of the stream parameters in the config to allow for more streams.
Otherwise the it seems the connection is being made just fine now and streams are being created. So things are mostly working now.
No luck changing these values, i still get the same error. But yes, you are right, I can see logs stating that the connection has started.
Shot in the dark here...
"Solana uses QUIC’s option to send a “challenge packet” to verify IP addresses. The whole point of this challenge is to avoid the certificate verification on the first step of the handshake, instead of doing it on the second part of the handshake after IP validation."
Interesting... Based on @lmvdz's comment, I set the enableEarlyData
option in the config to true, and I now get a new error:
INFO:QUICConnection 8dc351f0e387b93834e48bc8b64f8e33839f8f9a:ErrorQUICConnectionPeer: QUIC Connection peer error - Peer closed with transport code 2
INFO:QUICConnection 1fec30934c178b361ac23f04660d2f5a139a579e:Started QUICConnection
INFO:QUICClient:Created QUICClient to [::ffff:162.19.43.7]:25009
INFO:QUICStream 0:Create QUICStream
Failed to send transaction to TPU ErrorQUICStreamInternal: Failed to prime local stream state with a 0-length message
at new QUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICStream.ts:298:17)
at new QUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/events/src/Evented.ts:55:9)
... 4 lines matching cause stack trace ...
at /home/alex/projects/tpu-client/src/index.ts:331:64 {
data: {},
cause: Error: StreamLimit
at new QUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICStream.ts:291:25)
at new QUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/events/src/Evented.ts:55:9)
at new QUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/async-init/src/CreateDestroy.ts:49:26)
at Function.createQUICStream (/home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICStream.ts:76:20)
at constructor_.newStream (/home/alex/projects/tpu-client/node_modules/@matrixai/quic/src/QUICConnection.ts:1160:35)
at constructor_.newStream (/home/alex/projects/tpu-client/node_modules/@matrixai/async-init/src/StartStop.ts:244:18)
at /home/alex/projects/tpu-client/src/index.ts:331:64 {
code: 'GenericFailure'
},
timestamp: 2024-05-01T17:59:04.121Z
}
Still a stream limit error though, despite the values being quite big. Is this enableEarlyData
referring to this initial "challenge packet"?
have you set initialMaxStreamsBidi
and initialMaxStreamsUni
to a higher value?
INFO:QUICConnection 8dc351f0e387b93834e48bc8b64f8e33839f8f9a:ErrorQUICConnectionPeer: QUIC Connection peer error - Peer closed with transport code 2
seems to suggest that there's an issue with the peer. And the fact we're getting that just before the connection has started is very weird.
As for the https://github.com/MatrixAI/js-quic/issues/98#issuecomment-2088793895 comment. I'm not really sure what that is about. It may refer to the usual initial hadshake proceedure where the server will receive a connection, reply with the challenge that the client uses moving forward. We do something like that. I'm a bit fuzzy on the details so I can't go into it right now.
Actually, looking deeper at the error log @thealexcons posted, That stream limit is being thrown on the first stream being created. Which is a far cry from the default limit of 100 streams. Something very weird is happening here.
Yeah, sorry. All I can really say is that the StreamLimit
error is specifically an error coming out of quiche
when trying to start a new stream. It's happening on the first stream being created. I can only assume it's a problem with config in some way but none of the defaults should cause this. Frustratingly the rust docs for quiche
is a little vague about how some things work and when errors are thrown. So the StreamLimit
error might mean a few things I'm not aware of.
Keep in mind, at this stage there is no data send on any stream, We're only initialising state for a stream in quiche. If it's not a config problem then maybe it's some other interaction. Maybe waiting for a few seconds before attempting the first stream would make a difference?
Try creating the connection, sleeping to a few seconds and then attempting the first stream. Alternatively play around with some of the other config options and see what happens.
Anyone working on this still? Feels like we're really close.
quick update..
i got it to work :)
whoa, how? I've played around with the various limits and settings but with no success whatsoever...
It's not 100% tx hit rate, but this is what I did:
first thing was solana-quic doesn't support bidirectional communication... change the stream type to unidirectional. https://github.com/solana-labs/solana/blob/master/streamer/src/quic.rs#L96
const clientStream = client.connection.newStream('uni');
next the cert thing wasn't working CertificateRequired = 372
so i checked what solana does and replicated. (solana creates a self signed certificate, sending the private key doesn't count). https://docs.rs/solana-streamer/latest/src/solana_streamer/tls_certificates.rs.html#9-56
then I got DecodeError = 306
,
checked the cert on a website and saw that the key size was unsupported.. changed from default 1024 to 2048
import selfsigned from 'selfsigned';
const pems = selfsigned.generate([{name: 'commonName', value: 'Solana node'}, { name: "subjectAltName", value: [{ type: 7, value: "0.0.0.0" }]}], { days: 365, algorithm: 'ed25519', keySize: 2048 });
added the pems.private
and pems.cert
added applicationProtos
added verifyPeer
because atm we don't care about verifying the tpu's certificate.
const client = await QUICClient.createQUICClient({
config: {
key: pems.private,
cert: pems.cert,
verifyPeer: false,
applicationProtos: ['solana-tpu']
},
host: tpu_address.split(':')[0],
port: parseInt(tpu_address.split(':')[1]),
crypto: {
ops: {
randomBytes: async (data: ArrayBuffer): Promise<void> => {
webcrypto.getRandomValues(new Uint8Array(data));
},
},
}
}
);
I was able to get a couple of txs sent and confirmed on solscan. (don't want to share the tx as it contains my wallet address)
But the amount of failed tries i am getting/error codes of Peer closed with application code 1
Peer closed with transport code 11
and Peer closed with transport code 2
is worrisome.
We will need to dig into what exactly solana is doing for the connection/transaction, because opening all these quic connections just to send one transaction doesn't seem very smart and will get our ip's blacklisted or something...
as far as sending the transaction and getting it confirmed in an efficient way, we can discuss that on the tpu-client repo...
Oh cool, Yeah the stream limit error makes sense now. It would've been hitting the stream limit for one of the directions since it was unidirectional only.
The amount of failures there are odd. But the're all peer errors with the code provided by the peer. Hopefully solana-quic
documents what theses codes are.
so, the only change really needed on js-quic
side was the servername
option to be configurable, right?
@tegefaulkes maybe just add this via config option, since it won't break any method signatures...
Does #122 close this?
Yes
I'm marking this as close then. Fixed by #122
Describe the bug
I'm trying to connect to Solana TPU leaders via the quic string from the concactInfo,
but consistently get:
This happens to everyone of the Adresses from the cluster. I just found error code 306 here: https://github.com/MatrixAI/js-quic/blob/94f38390a3f667a829460c330157fbcc5a27a0c1/src/native/types.ts#L239
which made me think I did the randomBytes part wrong, but I don't really see how... I also tried using
peculiarWebcrypto
as in the benchmarks...To Reproduce
Expected behavior
established connection w/o errors
Platform (please complete the following information)
macOS 14.4.1 node v20.11.1 ts-node v10.9.2