Decode OP_RETURN data on tx page

decred / dcrdata

Decred block explorer, with packages and apps for data collection and storage. Written in Go.

https://dcrdata.decred.org/

ISC License

129 stars 128 forks source link

Decode OP_RETURN data on tx page #635

Open lealife opened 6 years ago

lealife commented 6 years ago

I have created a tx and add a OP_RETURN output at https://testnet.dcrdata.org/tx/92e6e6e7d2877e105435787fd73ec572cc85e3e620332cbfd65478c6c20aa0e2

OP_RETURN 48656c6c6f2c204465637265642e

Decode the Hex data 48656c6c6f2c204465637265642e to string (UTF-8) is Hello, Decred.

Maybe the decoded data can be showed on this page

reference: https://www.blocktrail.com/tBCC/tx/3bd425901bb4ddf2684e1bb85a5b65714a5835d78addf7b01c3aa674bacd8a4c

chappjc commented 6 years ago

Considerations:

which text encodings to try.
how to optimize decoding and detection of encoding
how to recognize a sensible result?
limiting length of data to decode

dmigwi commented 6 years ago

Just so I understand this task well, I am supposed to implement a way that the op_return text can be decoded for the user on the frontend? Similar to this BTC implementation... Right?

dmigwi commented 6 years ago

Can someone help me narrow down all the possible text encoding I should look into, So far I've got hex but it seems like other encodings are being used too.

chappjc commented 6 years ago

That is why this is a bit of a research project. There's UTF-8, ASCII, ANSI, etc.

To start, my suggestion is to look into how sites like https://cryptograffiti.info/ approach this (source: https://github.com/1Hyena/cryptograffiti).

Of course, the OPRETURN data can be anything_, so there's no way to know if you're decoding it right or if there is even a proper decoding. My suggestion is to look into golang libraries that aim to solve this problem for us. I feel strongly that this is not a problem we should attempt to solve. Surely there are libraries that scan binary files for recognizable data, which may be text or binary files like images and archives.

chappjc commented 6 years ago

I'm thinking along the lines of:

Then there are the considerations of doing everything efficiently (memory and CPU).

chappjc commented 6 years ago

Also, I suggest approaching this task in pieces. Perhaps look at the content detection / decoding issue first, then worry about the front end stuff later.

dmigwi commented 6 years ago

Taking into consideration that OP_RETURN does not allow huge data sizes to be stored, I am giving top priority to on the task to decode all its data into sensible UTF-8 text, I will look into file decoding next.

lealife commented 5 years ago

I think there is no need to detect the text encodings. DCRData should define a encoding (UTF-8) as the standard encoding, and all the users must follow the standard. Otherwise it will be so complicated and mess!! It should be a simple question.

chappjc commented 5 years ago

Well, how should we enforce this rule? OP_RETURN data can be anything at all. Could be jpeg data. This entire issue is purely amusement.

patrickdugan commented 4 years ago

Hey everyone,

My two cents: make it a byte limit of 260 bytes or something like that. If someone can make a multi-input/output or multisig with 500 bytes, a simpler tx in terms of outputs with a 260 byte op_code is ok (making the whole tx about ~500 bytes).

Litecoin has 40 bytes, a holdover from 2014. We made TradeLayer tx codes fit into that. Bitcoin moved to 80 bytes. Might as well be a little bit experimental for other apps that don't have the same rigor applied, idk. I like that we had to limbo in terms of design, but some things are counter-productive. For example, we add outputs for "reference addresses" that are then the destination for whatever is encoded in the OP_Return payload. It doesn't have to encode the address itself. The output itself adds about 30 bytes or so, so comparable to jamming the text into a big OP_Return payload. However long-term the savings are net-negative as each output must be redeemed with a ~120 byte signature. Segwit helps there. These little savings trade-offs may not matter all that much in the long-run. Let people try bigger OP_Codes why not.

ukane-philemon commented 1 year ago

@chappjc, is this still required?

chappjc commented 1 year ago

I don't care, personally. But it has never been completed. It was started in https://github.com/decred/dcrdata/pull/700 with a more open-ended decoding, which would probably end up being a DoS vector, but abandoned. Then in https://github.com/decred/dcrdata/pull/934 with an ultra-simplistic approach that treats the data as utf-8 bytes.

You're free to tackle this, but:

Keep the type changes to explorer/types (not api/types or db/dbtype), and ideally the code just to the internal/explorer package. API consumers don't get this guesswork result, only web page views since it's almost always meaningless, redundant, and always trivia. Also, it has nothing to do with DB.
Minimize dependencies. If any, they must be maintained and bring few or zero transitive dependencies.
Absolutely dead simple in terms of computational computational complexity.

Just see what happens with a dumb utf8 interpretation of the nulldata push.